All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible regression with skb_clone() in 2.6.36
@ 2010-09-08 23:32 ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-08 23:32 UTC (permalink / raw)
  To: linux-bluetooth; +Cc: netdev, linux-kernel, marcel, davem

I've been experiencing some problems when running the L2CAP Streaming mode in
2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
wasn't happening before, so I think we may have a regression here (I didn't
find where yet). The crash log is below.

The following patch does not fix the regression, but shows that removing the
skb_clone() call from l2cap_streaming_send() we workaround the problem. The
patch is good anyway because it saves memory and time.

By now I have no idea on how to fix this.


[ 5066.137533] Bluetooth: L2CAP ver 2.15
[ 5066.137873] Bluetooth: L2CAP socket layer initialized
[ 5066.545179] Bluetooth: RFCOMM TTY layer initialized
[ 5066.545879] Bluetooth: RFCOMM socket layer initialized
[ 5066.546582] Bluetooth: RFCOMM ver 1.11
[ 5092.268021] l2test invoked oom-killer: gfp_mask=0x4d0, order=0, oom_adj=0, oom_sc
ore_adj=0
[ 5092.268872] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
[ 5092.269863] Call Trace:
[ 5092.270265]  [<ffffffff8138b6a6>] ? _raw_spin_unlock+0x26/0x30
[ 5092.270878]  [<ffffffff810c0827>] T.427+0x77/0x1e0
[ 5092.271874]  [<ffffffff811b85e7>] ? security_real_capable_noaudit+0x37/0x60
[ 5092.272956]  [<ffffffff810c0e3a>] out_of_memory+0x2ca/0x2f0
[ 5092.273894]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
[ 5092.274871]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
[ 5092.275864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
[ 5092.276876]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
[ 5092.277865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.278652]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
[ 5092.278927]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.279864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
[ 5092.280864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
[ 5092.281867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
[ 5092.282865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
[ 5092.283932]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.284865]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
[ 5092.285652]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.285864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.286864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
[ 5092.287864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
[ 5092.288872]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
[ 5092.289927]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
[ 5092.290864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.291649]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.291864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.292864]  [<ffffffff8130740f>] sys_send+0xf/0x20
[ 5092.293870]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
[ 5092.294875]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30
[ 5092.295673] Mem-Info:
[ 5092.295863] DMA per-cpu:
[ 5092.296214] CPU    0: hi:    0, btch:   1 usd:   0
[ 5092.296925] DMA32 per-cpu:
[ 5092.297298] CPU    0: hi:   90, btch:  15 usd:  85
[ 5092.297864] active_anon:504 inactive_anon:543 isolated_anon:0
[ 5092.297865]  active_file:14 inactive_file:3 isolated_file:0
[ 5092.297866]  unevictable:0 dirty:0 writeback:3 unstable:0
[ 5092.297866]  free:728 slab_reclaimable:559 slab_unreclaimable:39895
[ 5092.297867]  mapped:12 shmem:35 pagetables:84 bounce:0
[ 5092.299944] DMA free:1064kB min:124kB low:152kB high:184kB active_anon:0kB inacti
ve_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB iso
lated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem
:0kB slab_reclaimable:0kB slab_unreclaimable:14916kB kernel_stack:0kB pagetables:0kB
 unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 5092.301862] lowmem_reserve[]: 0 236 236 236
[ 5092.303134] DMA32 free:1848kB min:1904kB low:2380kB high:2856kB active_anon:2016k
B inactive_anon:2172kB active_file:56kB inactive_file:12kB unevictable:0kB isolated(
anon):0kB isolated(file):0kB present:242380kB mlocked:0kB dirty:0kB writeback:12kB m
apped:48kB shmem:140kB slab_reclaimable:2236kB slab_unreclaimable:144664kB kernel_st
ack:456kB pagetables:336kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1
03 all_unreclaimable? yes
[ 5092.304962] lowmem_reserve[]: 0 0 0 0
[ 5092.306066] DMA: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024k
B 0*2048kB 0*4096kB = 1064kB
[ 5092.307862] DMA32: 10*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*10
24kB 0*2048kB 0*4096kB = 1848kB
[ 5092.309467] 52 total pagecache pages
[ 5092.309862] 0 pages in swap cache
[ 5092.310317] Swap cache stats: add 0, delete 0, find 0/0
[ 5092.310862] Free swap  = 0kB
[ 5092.311260] Total swap = 0kB
[ 5092.313796] 65530 pages RAM
[ 5092.313868] 22603 pages reserved
[ 5092.314324] 206 pages shared
[ 5092.314862] 39353 pages non-shared
[ 5092.315348] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[ 5092.315867] [    1]     0     1      438       19   0       0             0 init
[ 5092.316871] [ 1018]     0  1018      537       75   0     -17         -1000 udevd
[ 5092.318867] [ 2217]   101  2217      606       39   0       0             0 dbus-
daemon
[ 5092.319865] [ 2825]     0  2825      481       29   0       0             0 dhcpc
d
[ 5092.320865] [ 3497]     0  3497     1091       83   0       0             0 sshd
[ 5092.321969] [ 3535]     0  3535      477       31   0       0             0 agett
y
[ 5092.322864] [ 3536]     0  3536      477       30   0       0             0 agett
y
[ 5092.323864] [ 3537]     0  3537      477       29   0       0             0 agett
y
[ 5092.325937] [ 3538]     0  3538      477       31   0       0             0 agett
y
[ 5092.326865] [ 3539]     0  3539      477       31   0       0             0 agett
y
[ 5092.327864] [ 3540]     0  3540      477       31   0       0             0 agett
y
[ 5092.328864] [ 3541]     0  3541     1738      105   0       0             0 sshd
[ 5092.329865] [ 3545]     0  3545      745       72   0       0             0 bash
[ 5092.330864] [ 3550]     0  3550      706       43   0       0             0 scree
n
[ 5092.331866] [ 3551]     0  3551      772      124   0       0             0 scree
n
[ 5092.332965] [ 3552]     0  3552      745       77   0       0             0 bash
[ 5092.333865] [ 3557]     0  3557      745       77   0       0             0 bash
[ 5092.334867] [ 3562]     0  3562      536       75   0     -17         -1000 udevd
[ 5092.336865] [ 3567]     0  3567      536       75   0     -17         -1000 udevd
[ 5092.337865] [ 3887]     0  3887      935       54   0       0             0 bluet
oothd
[ 5092.338864] [ 3896]     0  3896      458       24   0       0             0 l2tes
t
[ 5092.339864] [ 3897]     0  3897      458       24   0       0             0 l2tes
t
[ 5092.340933] [ 3942]     0  3942      458       29   0       0             0 l2tes
t
[ 5092.341863] Kernel panic - not syncing: Out of memory and no killable processes..
.
[ 5092.341864]
[ 5092.342864] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
[ 5092.343862] Call Trace:
[ 5092.344934]  [<ffffffff813884d9>] panic+0x8c/0x199
[ 5092.345621]  [<ffffffff810c0e54>] out_of_memory+0x2e4/0x2f0
[ 5092.345864]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
[ 5092.346894]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
[ 5092.347864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
[ 5092.348871]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
[ 5092.349865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.350628]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
[ 5092.350935]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.351864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
[ 5092.352864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
[ 5092.353867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
[ 5092.354865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
[ 5092.355864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.356864]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
[ 5092.357612]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.357931]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.358864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
[ 5092.359864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
[ 5092.360929]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
[ 5092.361866]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
[ 5092.362864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.363653]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.364868]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.365865]  [<ffffffff8130740f>] sys_send+0xf/0x20
[ 5092.366945]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
[ 5092.367865]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30

Gustavo F. Padovan (1):
      Bluetooth: Simplify L2CAP Streaming mode sending


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Possible regression with skb_clone() in 2.6.36
@ 2010-09-08 23:32 ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-08 23:32 UTC (permalink / raw)
  To: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	marcel-kz+m5ild9QBg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q

I've been experiencing some problems when running the L2CAP Streaming mode in
2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
wasn't happening before, so I think we may have a regression here (I didn't
find where yet). The crash log is below.

The following patch does not fix the regression, but shows that removing the
skb_clone() call from l2cap_streaming_send() we workaround the problem. The
patch is good anyway because it saves memory and time.

By now I have no idea on how to fix this.


[ 5066.137533] Bluetooth: L2CAP ver 2.15
[ 5066.137873] Bluetooth: L2CAP socket layer initialized
[ 5066.545179] Bluetooth: RFCOMM TTY layer initialized
[ 5066.545879] Bluetooth: RFCOMM socket layer initialized
[ 5066.546582] Bluetooth: RFCOMM ver 1.11
[ 5092.268021] l2test invoked oom-killer: gfp_mask=0x4d0, order=0, oom_adj=0, oom_sc
ore_adj=0
[ 5092.268872] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
[ 5092.269863] Call Trace:
[ 5092.270265]  [<ffffffff8138b6a6>] ? _raw_spin_unlock+0x26/0x30
[ 5092.270878]  [<ffffffff810c0827>] T.427+0x77/0x1e0
[ 5092.271874]  [<ffffffff811b85e7>] ? security_real_capable_noaudit+0x37/0x60
[ 5092.272956]  [<ffffffff810c0e3a>] out_of_memory+0x2ca/0x2f0
[ 5092.273894]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
[ 5092.274871]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
[ 5092.275864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
[ 5092.276876]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
[ 5092.277865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.278652]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
[ 5092.278927]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.279864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
[ 5092.280864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
[ 5092.281867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
[ 5092.282865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
[ 5092.283932]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.284865]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
[ 5092.285652]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.285864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.286864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
[ 5092.287864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
[ 5092.288872]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
[ 5092.289927]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
[ 5092.290864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.291649]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.291864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.292864]  [<ffffffff8130740f>] sys_send+0xf/0x20
[ 5092.293870]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
[ 5092.294875]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30
[ 5092.295673] Mem-Info:
[ 5092.295863] DMA per-cpu:
[ 5092.296214] CPU    0: hi:    0, btch:   1 usd:   0
[ 5092.296925] DMA32 per-cpu:
[ 5092.297298] CPU    0: hi:   90, btch:  15 usd:  85
[ 5092.297864] active_anon:504 inactive_anon:543 isolated_anon:0
[ 5092.297865]  active_file:14 inactive_file:3 isolated_file:0
[ 5092.297866]  unevictable:0 dirty:0 writeback:3 unstable:0
[ 5092.297866]  free:728 slab_reclaimable:559 slab_unreclaimable:39895
[ 5092.297867]  mapped:12 shmem:35 pagetables:84 bounce:0
[ 5092.299944] DMA free:1064kB min:124kB low:152kB high:184kB active_anon:0kB inacti
ve_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB iso
lated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem
:0kB slab_reclaimable:0kB slab_unreclaimable:14916kB kernel_stack:0kB pagetables:0kB
 unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 5092.301862] lowmem_reserve[]: 0 236 236 236
[ 5092.303134] DMA32 free:1848kB min:1904kB low:2380kB high:2856kB active_anon:2016k
B inactive_anon:2172kB active_file:56kB inactive_file:12kB unevictable:0kB isolated(
anon):0kB isolated(file):0kB present:242380kB mlocked:0kB dirty:0kB writeback:12kB m
apped:48kB shmem:140kB slab_reclaimable:2236kB slab_unreclaimable:144664kB kernel_st
ack:456kB pagetables:336kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1
03 all_unreclaimable? yes
[ 5092.304962] lowmem_reserve[]: 0 0 0 0
[ 5092.306066] DMA: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024k
B 0*2048kB 0*4096kB = 1064kB
[ 5092.307862] DMA32: 10*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*10
24kB 0*2048kB 0*4096kB = 1848kB
[ 5092.309467] 52 total pagecache pages
[ 5092.309862] 0 pages in swap cache
[ 5092.310317] Swap cache stats: add 0, delete 0, find 0/0
[ 5092.310862] Free swap  = 0kB
[ 5092.311260] Total swap = 0kB
[ 5092.313796] 65530 pages RAM
[ 5092.313868] 22603 pages reserved
[ 5092.314324] 206 pages shared
[ 5092.314862] 39353 pages non-shared
[ 5092.315348] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[ 5092.315867] [    1]     0     1      438       19   0       0             0 init
[ 5092.316871] [ 1018]     0  1018      537       75   0     -17         -1000 udevd
[ 5092.318867] [ 2217]   101  2217      606       39   0       0             0 dbus-
daemon
[ 5092.319865] [ 2825]     0  2825      481       29   0       0             0 dhcpc
d
[ 5092.320865] [ 3497]     0  3497     1091       83   0       0             0 sshd
[ 5092.321969] [ 3535]     0  3535      477       31   0       0             0 agett
y
[ 5092.322864] [ 3536]     0  3536      477       30   0       0             0 agett
y
[ 5092.323864] [ 3537]     0  3537      477       29   0       0             0 agett
y
[ 5092.325937] [ 3538]     0  3538      477       31   0       0             0 agett
y
[ 5092.326865] [ 3539]     0  3539      477       31   0       0             0 agett
y
[ 5092.327864] [ 3540]     0  3540      477       31   0       0             0 agett
y
[ 5092.328864] [ 3541]     0  3541     1738      105   0       0             0 sshd
[ 5092.329865] [ 3545]     0  3545      745       72   0       0             0 bash
[ 5092.330864] [ 3550]     0  3550      706       43   0       0             0 scree
n
[ 5092.331866] [ 3551]     0  3551      772      124   0       0             0 scree
n
[ 5092.332965] [ 3552]     0  3552      745       77   0       0             0 bash
[ 5092.333865] [ 3557]     0  3557      745       77   0       0             0 bash
[ 5092.334867] [ 3562]     0  3562      536       75   0     -17         -1000 udevd
[ 5092.336865] [ 3567]     0  3567      536       75   0     -17         -1000 udevd
[ 5092.337865] [ 3887]     0  3887      935       54   0       0             0 bluet
oothd
[ 5092.338864] [ 3896]     0  3896      458       24   0       0             0 l2tes
t
[ 5092.339864] [ 3897]     0  3897      458       24   0       0             0 l2tes
t
[ 5092.340933] [ 3942]     0  3942      458       29   0       0             0 l2tes
t
[ 5092.341863] Kernel panic - not syncing: Out of memory and no killable processes..
.
[ 5092.341864]
[ 5092.342864] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
[ 5092.343862] Call Trace:
[ 5092.344934]  [<ffffffff813884d9>] panic+0x8c/0x199
[ 5092.345621]  [<ffffffff810c0e54>] out_of_memory+0x2e4/0x2f0
[ 5092.345864]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
[ 5092.346894]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
[ 5092.347864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
[ 5092.348871]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
[ 5092.349865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.350628]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
[ 5092.350935]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.351864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
[ 5092.352864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
[ 5092.353867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
[ 5092.354865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
[ 5092.355864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.356864]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
[ 5092.357612]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.357931]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.358864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
[ 5092.359864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
[ 5092.360929]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
[ 5092.361866]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
[ 5092.362864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.363653]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.364868]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.365865]  [<ffffffff8130740f>] sys_send+0xf/0x20
[ 5092.366945]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
[ 5092.367865]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30

Gustavo F. Padovan (1):
      Bluetooth: Simplify L2CAP Streaming mode sending

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] Bluetooth: Simplify L2CAP Streaming mode sending
  2010-09-08 23:32 ` Gustavo F. Padovan
  (?)
@ 2010-09-08 23:32 ` Gustavo F. Padovan
  -1 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-08 23:32 UTC (permalink / raw)
  To: linux-bluetooth; +Cc: netdev, linux-kernel, marcel, davem

As we don't have any error control on the Streaming mode, i.e., we don't
need to keep a copy of the skb for later resending we don't need to
call skb_clone() on it.
Then we can go one further here, and dequeue the skb before sending it,
that also means we don't need to look to sk->sk_send_head anymore.

The patch saves memory and time when sending Streaming mode data, so
it is good to mainline.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
---
 net/bluetooth/l2cap.c |   24 +++++++-----------------
 1 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
index 9fad312..f2062aa 100644
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -1441,33 +1441,23 @@ static inline void l2cap_do_send(struct sock *sk, struct sk_buff *skb)
 
 static void l2cap_streaming_send(struct sock *sk)
 {
-	struct sk_buff *skb, *tx_skb;
+	struct sk_buff *skb;
 	struct l2cap_pinfo *pi = l2cap_pi(sk);
 	u16 control, fcs;
 
-	while ((skb = sk->sk_send_head)) {
-		tx_skb = skb_clone(skb, GFP_ATOMIC);
-
-		control = get_unaligned_le16(tx_skb->data + L2CAP_HDR_SIZE);
+	while ((skb = skb_dequeue(TX_QUEUE(sk)))) {
+		control = get_unaligned_le16(skb->data + L2CAP_HDR_SIZE);
 		control |= pi->next_tx_seq << L2CAP_CTRL_TXSEQ_SHIFT;
-		put_unaligned_le16(control, tx_skb->data + L2CAP_HDR_SIZE);
+		put_unaligned_le16(control, skb->data + L2CAP_HDR_SIZE);
 
 		if (pi->fcs == L2CAP_FCS_CRC16) {
-			fcs = crc16(0, (u8 *)tx_skb->data, tx_skb->len - 2);
-			put_unaligned_le16(fcs, tx_skb->data + tx_skb->len - 2);
+			fcs = crc16(0, (u8 *)skb->data, skb->len - 2);
+			put_unaligned_le16(fcs, skb->data + skb->len - 2);
 		}
 
-		l2cap_do_send(sk, tx_skb);
+		l2cap_do_send(sk, skb);
 
 		pi->next_tx_seq = (pi->next_tx_seq + 1) % 64;
-
-		if (skb_queue_is_last(TX_QUEUE(sk), skb))
-			sk->sk_send_head = NULL;
-		else
-			sk->sk_send_head = skb_queue_next(TX_QUEUE(sk), skb);
-
-		skb = skb_dequeue(TX_QUEUE(sk));
-		kfree_skb(skb);
 	}
 }
 
-- 
1.7.2.2


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-10 10:15   ` Jarek Poplawski
  0 siblings, 0 replies; 21+ messages in thread
From: Jarek Poplawski @ 2010-09-10 10:15 UTC (permalink / raw)
  To: Gustavo F. Padovan; +Cc: netdev, linux-kernel, marcel, davem, linux-bluetooth

On 2010-09-09 01:32, Gustavo F. Padovan wrote:
> I've been experiencing some problems when running the L2CAP Streaming mode in
> 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> wasn't happening before, so I think we may have a regression here (I didn't
> find where yet). The crash log is below.
> 
> The following patch does not fix the regression, but shows that removing the
> skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> patch is good anyway because it saves memory and time.
> 
> By now I have no idea on how to fix this.

My suggestion is to check the length of conn->data_q in hci_send_acl()
while this happens, and maybe try to set some limit here.

Jarek P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-10 10:15   ` Jarek Poplawski
  0 siblings, 0 replies; 21+ messages in thread
From: Jarek Poplawski @ 2010-09-10 10:15 UTC (permalink / raw)
  To: Gustavo F. Padovan
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	marcel-kz+m5ild9QBg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA

On 2010-09-09 01:32, Gustavo F. Padovan wrote:
> I've been experiencing some problems when running the L2CAP Streaming mode in
> 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> wasn't happening before, so I think we may have a regression here (I didn't
> find where yet). The crash log is below.
> 
> The following patch does not fix the regression, but shows that removing the
> skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> patch is good anyway because it saves memory and time.
> 
> By now I have no idea on how to fix this.

My suggestion is to check the length of conn->data_q in hci_send_acl()
while this happens, and maybe try to set some limit here.

Jarek P.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-10 16:53   ` Mat Martineau
  0 siblings, 0 replies; 21+ messages in thread
From: Mat Martineau @ 2010-09-10 16:53 UTC (permalink / raw)
  To: Gustavo F. Padovan; +Cc: linux-bluetooth, netdev, linux-kernel, marcel, davem


Gustavo -

I'm not sure why the streaming code used to work, but this does not 
look like an skb_clone() problem.  Your patch to remove the 
skb_clone() call in l2cap_streaming_send() addresses the root cause of 
this crash.

On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:

> I've been experiencing some problems when running the L2CAP Streaming mode in
> 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> wasn't happening before, so I think we may have a regression here (I didn't
> find where yet). The crash log is below.
>
> The following patch does not fix the regression, but shows that removing the
> skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> patch is good anyway because it saves memory and time.
>
> By now I have no idea on how to fix this.
>
> <snip>

This has to do with the sk->sk_wmem_alloc accounting that controls the 
amount of write buffer space used on the socket.

When the L2CAP streaming mode socket segments its data, it allocates 
memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before 
that allocation call returns, skb_set_owner_w() is called on the new 
skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that 
sk->sk_wmem_alloc is correctly updated when the skb is freed.

When that skb is cloned, the clone is not "owned" by the write buffer. 
The clone's destructor is set to NULL in __skb_clone().  The version 
of l2cap_streaming_send() that runs out of memory is passing the 
non-owned skb clone down to the HCI layer.  The original skb (the one 
that's "owned by w") is immediately freed, which adjusts 
sk->sk_wmem_alloc back down - the socket thinks it has unlimited write 
buffer space.  As a result, bt_skb_send_alloc() never blocks waiting 
for buffer space (or returns EAGAIN for nonblocking writes) and the 
HCI send queue keeps growing.

This isn't a problem for the ERTM sends, because the original skbs are 
kept in the ERTM tx queue until they are acked.  Once they're acked, 
the write buffer space is freed and additional skbs can be allocated.

Regards,

--
Mat Martineau
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-10 16:53   ` Mat Martineau
  0 siblings, 0 replies; 21+ messages in thread
From: Mat Martineau @ 2010-09-10 16:53 UTC (permalink / raw)
  To: Gustavo F. Padovan
  Cc: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	marcel-kz+m5ild9QBg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q


Gustavo -

I'm not sure why the streaming code used to work, but this does not 
look like an skb_clone() problem.  Your patch to remove the 
skb_clone() call in l2cap_streaming_send() addresses the root cause of 
this crash.

On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:

> I've been experiencing some problems when running the L2CAP Streaming mode in
> 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> wasn't happening before, so I think we may have a regression here (I didn't
> find where yet). The crash log is below.
>
> The following patch does not fix the regression, but shows that removing the
> skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> patch is good anyway because it saves memory and time.
>
> By now I have no idea on how to fix this.
>
> <snip>

This has to do with the sk->sk_wmem_alloc accounting that controls the 
amount of write buffer space used on the socket.

When the L2CAP streaming mode socket segments its data, it allocates 
memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before 
that allocation call returns, skb_set_owner_w() is called on the new 
skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that 
sk->sk_wmem_alloc is correctly updated when the skb is freed.

When that skb is cloned, the clone is not "owned" by the write buffer. 
The clone's destructor is set to NULL in __skb_clone().  The version 
of l2cap_streaming_send() that runs out of memory is passing the 
non-owned skb clone down to the HCI layer.  The original skb (the one 
that's "owned by w") is immediately freed, which adjusts 
sk->sk_wmem_alloc back down - the socket thinks it has unlimited write 
buffer space.  As a result, bt_skb_send_alloc() never blocks waiting 
for buffer space (or returns EAGAIN for nonblocking writes) and the 
HCI send queue keeps growing.

This isn't a problem for the ERTM sends, because the original skbs are 
kept in the ERTM tx queue until they are acked.  Once they're acked, 
the write buffer space is freed and additional skbs can be allocated.

Regards,

--
Mat Martineau
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
  2010-09-10 16:53   ` Mat Martineau
@ 2010-09-10 19:45     ` Gustavo F. Padovan
  -1 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-10 19:45 UTC (permalink / raw)
  To: Mat Martineau; +Cc: linux-bluetooth, netdev, linux-kernel, marcel, davem

Hi Mat,

* Mat Martineau <mathewm@codeaurora.org> [2010-09-10 09:53:31 -0700]:

> 
> Gustavo -
> 
> I'm not sure why the streaming code used to work, but this does not 
> look like an skb_clone() problem.  Your patch to remove the 
> skb_clone() call in l2cap_streaming_send() addresses the root cause of 
> this crash.
> 
> On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:
> 
> > I've been experiencing some problems when running the L2CAP Streaming mode in
> > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> > wasn't happening before, so I think we may have a regression here (I didn't
> > find where yet). The crash log is below.
> >
> > The following patch does not fix the regression, but shows that removing the
> > skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> > patch is good anyway because it saves memory and time.
> >
> > By now I have no idea on how to fix this.
> >
> > <snip>
> 
> This has to do with the sk->sk_wmem_alloc accounting that controls the 
> amount of write buffer space used on the socket.
> 
> When the L2CAP streaming mode socket segments its data, it allocates 
> memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before 
> that allocation call returns, skb_set_owner_w() is called on the new 
> skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that 
> sk->sk_wmem_alloc is correctly updated when the skb is freed.
> 
> When that skb is cloned, the clone is not "owned" by the write buffer. 
> The clone's destructor is set to NULL in __skb_clone().  The version 
> of l2cap_streaming_send() that runs out of memory is passing the 
> non-owned skb clone down to the HCI layer.  The original skb (the one 
> that's "owned by w") is immediately freed, which adjusts 
> sk->sk_wmem_alloc back down - the socket thinks it has unlimited write 
> buffer space.  As a result, bt_skb_send_alloc() never blocks waiting 
> for buffer space (or returns EAGAIN for nonblocking writes) and the 
> HCI send queue keeps growing.

If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on
the cloned skb should solve the problem, but it doesn't. That's exactly
what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was
working fine before, i.e, we have a regression here. 

> 
> This isn't a problem for the ERTM sends, because the original skbs are 
> kept in the ERTM tx queue until they are acked.  Once they're acked, 
> the write buffer space is freed and additional skbs can be allocated.

It affects ERTM as well, but in that case the kernel doesn't crash
because ERTM block on sending trying to allocate memory. Then we are not
able to receive any ack (everything stays queued in sk_backlog_queue as
the sk is owned by the user) and ERTM stalls.

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-10 19:45     ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-10 19:45 UTC (permalink / raw)
  To: Mat Martineau
  Cc: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	marcel-kz+m5ild9QBg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q

Hi Mat,

* Mat Martineau <mathewm-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> [2010-09-10 09:53:31 -0700]:

> 
> Gustavo -
> 
> I'm not sure why the streaming code used to work, but this does not 
> look like an skb_clone() problem.  Your patch to remove the 
> skb_clone() call in l2cap_streaming_send() addresses the root cause of 
> this crash.
> 
> On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:
> 
> > I've been experiencing some problems when running the L2CAP Streaming mode in
> > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> > wasn't happening before, so I think we may have a regression here (I didn't
> > find where yet). The crash log is below.
> >
> > The following patch does not fix the regression, but shows that removing the
> > skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> > patch is good anyway because it saves memory and time.
> >
> > By now I have no idea on how to fix this.
> >
> > <snip>
> 
> This has to do with the sk->sk_wmem_alloc accounting that controls the 
> amount of write buffer space used on the socket.
> 
> When the L2CAP streaming mode socket segments its data, it allocates 
> memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before 
> that allocation call returns, skb_set_owner_w() is called on the new 
> skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that 
> sk->sk_wmem_alloc is correctly updated when the skb is freed.
> 
> When that skb is cloned, the clone is not "owned" by the write buffer. 
> The clone's destructor is set to NULL in __skb_clone().  The version 
> of l2cap_streaming_send() that runs out of memory is passing the 
> non-owned skb clone down to the HCI layer.  The original skb (the one 
> that's "owned by w") is immediately freed, which adjusts 
> sk->sk_wmem_alloc back down - the socket thinks it has unlimited write 
> buffer space.  As a result, bt_skb_send_alloc() never blocks waiting 
> for buffer space (or returns EAGAIN for nonblocking writes) and the 
> HCI send queue keeps growing.

If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on
the cloned skb should solve the problem, but it doesn't. That's exactly
what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was
working fine before, i.e, we have a regression here. 

> 
> This isn't a problem for the ERTM sends, because the original skbs are 
> kept in the ERTM tx queue until they are acked.  Once they're acked, 
> the write buffer space is freed and additional skbs can be allocated.

It affects ERTM as well, but in that case the kernel doesn't crash
because ERTM block on sending trying to allocate memory. Then we are not
able to receive any ack (everything stays queued in sk_backlog_queue as
the sk is owned by the user) and ERTM stalls.

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
  2010-09-10 19:45     ` Gustavo F. Padovan
@ 2010-09-15 18:42       ` Gustavo F. Padovan
  -1 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-15 18:42 UTC (permalink / raw)
  To: Mat Martineau; +Cc: linux-bluetooth, netdev, linux-kernel, marcel, davem

Hi Mat,

* Gustavo F. Padovan <padovan@profusion.mobi> [2010-09-10 16:45:09 -0300]:

> Hi Mat,
> 
> * Mat Martineau <mathewm@codeaurora.org> [2010-09-10 09:53:31 -0700]:
> 
> > 
> > Gustavo -
> > 
> > I'm not sure why the streaming code used to work, but this does not 
> > look like an skb_clone() problem.  Your patch to remove the 
> > skb_clone() call in l2cap_streaming_send() addresses the root cause of 
> > this crash.
> > 
> > On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:
> > 
> > > I've been experiencing some problems when running the L2CAP Streaming mode in
> > > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> > > wasn't happening before, so I think we may have a regression here (I didn't
> > > find where yet). The crash log is below.
> > >
> > > The following patch does not fix the regression, but shows that removing the
> > > skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> > > patch is good anyway because it saves memory and time.
> > >
> > > By now I have no idea on how to fix this.
> > >
> > > <snip>
> > 
> > This has to do with the sk->sk_wmem_alloc accounting that controls the 
> > amount of write buffer space used on the socket.
> > 
> > When the L2CAP streaming mode socket segments its data, it allocates 
> > memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before 
> > that allocation call returns, skb_set_owner_w() is called on the new 
> > skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that 
> > sk->sk_wmem_alloc is correctly updated when the skb is freed.
> > 
> > When that skb is cloned, the clone is not "owned" by the write buffer. 
> > The clone's destructor is set to NULL in __skb_clone().  The version 
> > of l2cap_streaming_send() that runs out of memory is passing the 
> > non-owned skb clone down to the HCI layer.  The original skb (the one 
> > that's "owned by w") is immediately freed, which adjusts 
> > sk->sk_wmem_alloc back down - the socket thinks it has unlimited write 
> > buffer space.  As a result, bt_skb_send_alloc() never blocks waiting 
> > for buffer space (or returns EAGAIN for nonblocking writes) and the 
> > HCI send queue keeps growing.
> 
> If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on
> the cloned skb should solve the problem, but it doesn't. That's exactly
> what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was
> working fine before, i.e, we have a regression here. 

I've run some other tests and what you said also fixes the problem for
Streaming Mode. If I use skb_set_owner_w() on the cloned skb, everything
works fine. But we still have the problem for ERTM as I described.
send() blocks wainting for memory. The regression is there yet.
:(

> 
> > 
> > This isn't a problem for the ERTM sends, because the original skbs are 
> > kept in the ERTM tx queue until they are acked.  Once they're acked, 
> > the write buffer space is freed and additional skbs can be allocated.
> 
> It affects ERTM as well, but in that case the kernel doesn't crash
> because ERTM block on sending trying to allocate memory. Then we are not
> able to receive any ack (everything stays queued in sk_backlog_queue as
> the sk is owned by the user) and ERTM stalls.
> 

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-15 18:42       ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-15 18:42 UTC (permalink / raw)
  To: Mat Martineau
  Cc: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	marcel-kz+m5ild9QBg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q

Hi Mat,

* Gustavo F. Padovan <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org> [2010-09-10 16:45:09 -0300]:

> Hi Mat,
> 
> * Mat Martineau <mathewm-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> [2010-09-10 09:53:31 -0700]:
> 
> > 
> > Gustavo -
> > 
> > I'm not sure why the streaming code used to work, but this does not 
> > look like an skb_clone() problem.  Your patch to remove the 
> > skb_clone() call in l2cap_streaming_send() addresses the root cause of 
> > this crash.
> > 
> > On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:
> > 
> > > I've been experiencing some problems when running the L2CAP Streaming mode in
> > > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> > > wasn't happening before, so I think we may have a regression here (I didn't
> > > find where yet). The crash log is below.
> > >
> > > The following patch does not fix the regression, but shows that removing the
> > > skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> > > patch is good anyway because it saves memory and time.
> > >
> > > By now I have no idea on how to fix this.
> > >
> > > <snip>
> > 
> > This has to do with the sk->sk_wmem_alloc accounting that controls the 
> > amount of write buffer space used on the socket.
> > 
> > When the L2CAP streaming mode socket segments its data, it allocates 
> > memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before 
> > that allocation call returns, skb_set_owner_w() is called on the new 
> > skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that 
> > sk->sk_wmem_alloc is correctly updated when the skb is freed.
> > 
> > When that skb is cloned, the clone is not "owned" by the write buffer. 
> > The clone's destructor is set to NULL in __skb_clone().  The version 
> > of l2cap_streaming_send() that runs out of memory is passing the 
> > non-owned skb clone down to the HCI layer.  The original skb (the one 
> > that's "owned by w") is immediately freed, which adjusts 
> > sk->sk_wmem_alloc back down - the socket thinks it has unlimited write 
> > buffer space.  As a result, bt_skb_send_alloc() never blocks waiting 
> > for buffer space (or returns EAGAIN for nonblocking writes) and the 
> > HCI send queue keeps growing.
> 
> If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on
> the cloned skb should solve the problem, but it doesn't. That's exactly
> what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was
> working fine before, i.e, we have a regression here. 

I've run some other tests and what you said also fixes the problem for
Streaming Mode. If I use skb_set_owner_w() on the cloned skb, everything
works fine. But we still have the problem for ERTM as I described.
send() blocks wainting for memory. The regression is there yet.
:(

> 
> > 
> > This isn't a problem for the ERTM sends, because the original skbs are 
> > kept in the ERTM tx queue until they are acked.  Once they're acked, 
> > the write buffer space is freed and additional skbs can be allocated.
> 
> It affects ERTM as well, but in that case the kernel doesn't crash
> because ERTM block on sending trying to allocate memory. Then we are not
> able to receive any ack (everything stays queued in sk_backlog_queue as
> the sk is owned by the user) and ERTM stalls.
> 

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
  2010-09-10 19:45     ` Gustavo F. Padovan
@ 2010-09-16  0:10       ` Gustavo F. Padovan
  -1 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-16  0:10 UTC (permalink / raw)
  To: Mat Martineau; +Cc: linux-bluetooth, netdev, linux-kernel, marcel, davem

* Gustavo F. Padovan <padovan@profusion.mobi> [2010-09-10 16:45:09 -0300]:

> Hi Mat,
> 
> * Mat Martineau <mathewm@codeaurora.org> [2010-09-10 09:53:31 -0700]:
> 
> > 
> > Gustavo -
> > 
> > I'm not sure why the streaming code used to work, but this does not 
> > look like an skb_clone() problem.  Your patch to remove the 
> > skb_clone() call in l2cap_streaming_send() addresses the root cause of 
> > this crash.
> > 
> > On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:
> > 
> > > I've been experiencing some problems when running the L2CAP Streaming mode in
> > > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> > > wasn't happening before, so I think we may have a regression here (I didn't
> > > find where yet). The crash log is below.
> > >
> > > The following patch does not fix the regression, but shows that removing the
> > > skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> > > patch is good anyway because it saves memory and time.
> > >
> > > By now I have no idea on how to fix this.
> > >
> > > <snip>
> > 
> > This has to do with the sk->sk_wmem_alloc accounting that controls the 
> > amount of write buffer space used on the socket.
> > 
> > When the L2CAP streaming mode socket segments its data, it allocates 
> > memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before 
> > that allocation call returns, skb_set_owner_w() is called on the new 
> > skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that 
> > sk->sk_wmem_alloc is correctly updated when the skb is freed.
> > 
> > When that skb is cloned, the clone is not "owned" by the write buffer. 
> > The clone's destructor is set to NULL in __skb_clone().  The version 
> > of l2cap_streaming_send() that runs out of memory is passing the 
> > non-owned skb clone down to the HCI layer.  The original skb (the one 
> > that's "owned by w") is immediately freed, which adjusts 
> > sk->sk_wmem_alloc back down - the socket thinks it has unlimited write 
> > buffer space.  As a result, bt_skb_send_alloc() never blocks waiting 
> > for buffer space (or returns EAGAIN for nonblocking writes) and the 
> > HCI send queue keeps growing.
> 
> If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on
> the cloned skb should solve the problem, but it doesn't. That's exactly
> what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was
> working fine before, i.e, we have a regression here. 
> 
> > 
> > This isn't a problem for the ERTM sends, because the original skbs are 
> > kept in the ERTM tx queue until they are acked.  Once they're acked, 
> > the write buffer space is freed and additional skbs can be allocated.
> 
> It affects ERTM as well, but in that case the kernel doesn't crash
> because ERTM block on sending trying to allocate memory. Then we are not
> able to receive any ack (everything stays queued in sk_backlog_queue as
> the sk is owned by the user) and ERTM stalls.

By reverting 

commit 218bb9dfd21472128f86b38ad2eab123205c2991
Author: Gustavo F. Padovan <padovan@profusion.mobi>
Date:   Mon Jun 21 18:53:22 2010 -0300

    Bluetooth: Add backlog queue to ERTM code
    
    backlog queue is the canonical mechanism to avoid race conditions due
    interrupts in bottom half context. After the socket lock is released the
    net core take care of push all skb in its backlog queue.
    
    Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>


we can workaround the bug. It's not the real cause of the bug because the
backlog queue was working before 2.6.36, but removing the backlog queue scheme
make L2CAP works again and also give us more time to find the real cause of the
problem.
    
Before  implement the backlog queue L2CAP had his own lock scheme to avoid race
conditions. The point is that the backlog queue adds too much serialization to
L2CAP, that was the cause of the ERTM bug. The old scheme (the one we are going
to use after revert this commit) just serialize access to some places. 

By not using the backlog queue, we can receive the acknowledgement frames more
quickly and then free the acked frames quickly. In fact the old scheme is
looks better than backlog queue.

I think we should proceed with the revert with this commit in mainline, and at the
same time try to find the root cause of the problem.

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-16  0:10       ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-16  0:10 UTC (permalink / raw)
  To: Mat Martineau
  Cc: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	marcel-kz+m5ild9QBg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q

* Gustavo F. Padovan <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org> [2010-09-10 16:45:09 -0300]:

> Hi Mat,
> 
> * Mat Martineau <mathewm-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org> [2010-09-10 09:53:31 -0700]:
> 
> > 
> > Gustavo -
> > 
> > I'm not sure why the streaming code used to work, but this does not 
> > look like an skb_clone() problem.  Your patch to remove the 
> > skb_clone() call in l2cap_streaming_send() addresses the root cause of 
> > this crash.
> > 
> > On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:
> > 
> > > I've been experiencing some problems when running the L2CAP Streaming mode in
> > > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> > > wasn't happening before, so I think we may have a regression here (I didn't
> > > find where yet). The crash log is below.
> > >
> > > The following patch does not fix the regression, but shows that removing the
> > > skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> > > patch is good anyway because it saves memory and time.
> > >
> > > By now I have no idea on how to fix this.
> > >
> > > <snip>
> > 
> > This has to do with the sk->sk_wmem_alloc accounting that controls the 
> > amount of write buffer space used on the socket.
> > 
> > When the L2CAP streaming mode socket segments its data, it allocates 
> > memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before 
> > that allocation call returns, skb_set_owner_w() is called on the new 
> > skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that 
> > sk->sk_wmem_alloc is correctly updated when the skb is freed.
> > 
> > When that skb is cloned, the clone is not "owned" by the write buffer. 
> > The clone's destructor is set to NULL in __skb_clone().  The version 
> > of l2cap_streaming_send() that runs out of memory is passing the 
> > non-owned skb clone down to the HCI layer.  The original skb (the one 
> > that's "owned by w") is immediately freed, which adjusts 
> > sk->sk_wmem_alloc back down - the socket thinks it has unlimited write 
> > buffer space.  As a result, bt_skb_send_alloc() never blocks waiting 
> > for buffer space (or returns EAGAIN for nonblocking writes) and the 
> > HCI send queue keeps growing.
> 
> If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on
> the cloned skb should solve the problem, but it doesn't. That's exactly
> what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was
> working fine before, i.e, we have a regression here. 
> 
> > 
> > This isn't a problem for the ERTM sends, because the original skbs are 
> > kept in the ERTM tx queue until they are acked.  Once they're acked, 
> > the write buffer space is freed and additional skbs can be allocated.
> 
> It affects ERTM as well, but in that case the kernel doesn't crash
> because ERTM block on sending trying to allocate memory. Then we are not
> able to receive any ack (everything stays queued in sk_backlog_queue as
> the sk is owned by the user) and ERTM stalls.

By reverting 

commit 218bb9dfd21472128f86b38ad2eab123205c2991
Author: Gustavo F. Padovan <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org>
Date:   Mon Jun 21 18:53:22 2010 -0300

    Bluetooth: Add backlog queue to ERTM code
    
    backlog queue is the canonical mechanism to avoid race conditions due
    interrupts in bottom half context. After the socket lock is released the
    net core take care of push all skb in its backlog queue.
    
    Signed-off-by: Gustavo F. Padovan <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org>
    Signed-off-by: Marcel Holtmann <marcel-kz+m5ild9QBg9hUCZPvPmw@public.gmane.org>


we can workaround the bug. It's not the real cause of the bug because the
backlog queue was working before 2.6.36, but removing the backlog queue scheme
make L2CAP works again and also give us more time to find the real cause of the
problem.
    
Before  implement the backlog queue L2CAP had his own lock scheme to avoid race
conditions. The point is that the backlog queue adds too much serialization to
L2CAP, that was the cause of the ERTM bug. The old scheme (the one we are going
to use after revert this commit) just serialize access to some places. 

By not using the backlog queue, we can receive the acknowledgement frames more
quickly and then free the acked frames quickly. In fact the old scheme is
looks better than backlog queue.

I think we should proceed with the revert with this commit in mainline, and at the
same time try to find the root cause of the problem.

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
  2010-09-16  0:10       ` Gustavo F. Padovan
  (?)
@ 2010-09-16 16:40       ` Mat Martineau
  2010-09-17 21:08         ` Gustavo F. Padovan
  -1 siblings, 1 reply; 21+ messages in thread
From: Mat Martineau @ 2010-09-16 16:40 UTC (permalink / raw)
  To: Gustavo F. Padovan; +Cc: linux-bluetooth, marcel


Gustavo -

Since this is now a only a discussion of blocking behavior with ERTM 
(and not a regression with skb_clone()), I've removed linux-kernel and 
netdev from the cc list.  If you think they still need to see this 
thread, please add them back in.


On Wed, 15 Sep 2010, Gustavo F. Padovan wrote:

> * Gustavo F. Padovan <padovan@profusion.mobi> [2010-09-10 16:45:09 -0300]:
>
>> Hi Mat,
>>
>> * Mat Martineau <mathewm@codeaurora.org> [2010-09-10 09:53:31 -0700]:
>>
>>>
>>> Gustavo -
>>>
>>> I'm not sure why the streaming code used to work, but this does not
>>> look like an skb_clone() problem.  Your patch to remove the
>>> skb_clone() call in l2cap_streaming_send() addresses the root cause of
>>> this crash.
>>>
>>> On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:
>>>
>>>> I've been experiencing some problems when running the L2CAP Streaming mode in
>>>> 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
>>>> wasn't happening before, so I think we may have a regression here (I didn't
>>>> find where yet). The crash log is below.
>>>>
>>>> The following patch does not fix the regression, but shows that removing the
>>>> skb_clone() call from l2cap_streaming_send() we workaround the problem. The
>>>> patch is good anyway because it saves memory and time.
>>>>
>>>> By now I have no idea on how to fix this.
>>>>
>>>> <snip>
>>>
>>> This has to do with the sk->sk_wmem_alloc accounting that controls the
>>> amount of write buffer space used on the socket.
>>>
>>> When the L2CAP streaming mode socket segments its data, it allocates
>>> memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before
>>> that allocation call returns, skb_set_owner_w() is called on the new
>>> skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that
>>> sk->sk_wmem_alloc is correctly updated when the skb is freed.
>>>
>>> When that skb is cloned, the clone is not "owned" by the write buffer.
>>> The clone's destructor is set to NULL in __skb_clone().  The version
>>> of l2cap_streaming_send() that runs out of memory is passing the
>>> non-owned skb clone down to the HCI layer.  The original skb (the one
>>> that's "owned by w") is immediately freed, which adjusts
>>> sk->sk_wmem_alloc back down - the socket thinks it has unlimited write
>>> buffer space.  As a result, bt_skb_send_alloc() never blocks waiting
>>> for buffer space (or returns EAGAIN for nonblocking writes) and the
>>> HCI send queue keeps growing.
>>
>> If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on
>> the cloned skb should solve the problem, but it doesn't. That's exactly
>> what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was
>> working fine before, i.e, we have a regression here.
>>
>>>
>>> This isn't a problem for the ERTM sends, because the original skbs are
>>> kept in the ERTM tx queue until they are acked.  Once they're acked,
>>> the write buffer space is freed and additional skbs can be allocated.
>>
>> It affects ERTM as well, but in that case the kernel doesn't crash
>> because ERTM block on sending trying to allocate memory. Then we are not
>> able to receive any ack (everything stays queued in sk_backlog_queue as
>> the sk is owned by the user) and ERTM stalls.
>
> By reverting
>
> commit 218bb9dfd21472128f86b38ad2eab123205c2991
> Author: Gustavo F. Padovan <padovan@profusion.mobi>
> Date:   Mon Jun 21 18:53:22 2010 -0300
>
>    Bluetooth: Add backlog queue to ERTM code
>
>    backlog queue is the canonical mechanism to avoid race conditions due
>    interrupts in bottom half context. After the socket lock is released the
>    net core take care of push all skb in its backlog queue.
>
>    Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
>    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
>
>
> we can workaround the bug. It's not the real cause of the bug because the
> backlog queue was working before 2.6.36, but removing the backlog queue scheme
> make L2CAP works again and also give us more time to find the real cause of the
> problem.
>
> Before  implement the backlog queue L2CAP had his own lock scheme to avoid race
> conditions. The point is that the backlog queue adds too much serialization to
> L2CAP, that was the cause of the ERTM bug. The old scheme (the one we are going
> to use after revert this commit) just serialize access to some places.
>
> By not using the backlog queue, we can receive the acknowledgement frames more
> quickly and then free the acked frames quickly. In fact the old scheme is
> looks better than backlog queue.
>
> I think we should proceed with the revert with this commit in mainline, and at the
> same time try to find the root cause of the problem.

I don't think the backlog patch should be reverted.  Although the old 
code would appear to get things working again, it's only because locks 
were being ignored in some important cases.  Maybe it seems ok most of 
the time, but the ERTM socket state was not being properly protected. 
The backlog is the right way to do it, but also depends on correct 
lock handling in the rest of L2CAP.


I think this is one of the locking issues from this message to 
linux-bluetooth ("ERTM known bugs/regressions (was Re: [PATCH 0/8] 
...)") on August 2:

http://www.spinics.net/lists/linux-bluetooth/msg06734.html


We've both arrived at the root cause already: incoming frames 
(including acks) get stuck on the backlog queue while the socket is 
locked waiting for memory -- and no memory gets freed until the acks 
are processed.

The only time the socket lock is held for a long time is when waiting 
for memory, so I think the best solution is to not hold the lock while 
allocating.  This should be safe, since the only socket state that's 
changed during bt_skb_send_alloc() is sk->sk_wmem_alloc, which is an 
atomic_t.

Then some changes are needed to these functions:

l2cap_sar_segment_sdu() and l2cap_create_iframe_pdu(): Take some args 
on the stack (like remote_mps) instead of accessing l2cap_pinfo. 
Don't add to TX_QUEUE(sk) within these functions (let the caller do 
that).

l2cap_sock_sendmsg(): Copy necessary data to local variables from 
l2cap_pinfo while the lock is held.  release_sock(sk), then call 
l2cap_sar_segment_sdu() (which will work for the "one PDU" case too) 
with the new args for mps, fcs, etc.  After l2cap_sar_segment_sdu() 
returns, check the socket state again (like the code at the beginning 
of l2cap_sock_sendmsg()) and then lock_sock(sk), queue the data, and 
send it.

(I know it would be better and clearer to send a patch for this, but 
I have some other things to fix right now!)


The other locking problem (with ERTM timers) also deserves some 
attention, but maybe in a new email thread!


Regards,

--
Mat Martineau
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
  2010-09-16 16:40       ` Mat Martineau
@ 2010-09-17 21:08         ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-17 21:08 UTC (permalink / raw)
  To: Mat Martineau; +Cc: linux-bluetooth, marcel

Hi Mat,

* Mat Martineau <mathewm@codeaurora.org> [2010-09-16 09:40:54 -0700]:

> 
> Gustavo -
> 
> Since this is now a only a discussion of blocking behavior with ERTM 
> (and not a regression with skb_clone()), I've removed linux-kernel and 
> netdev from the cc list.  If you think they still need to see this 
> thread, please add them back in.
> 
> 
> On Wed, 15 Sep 2010, Gustavo F. Padovan wrote:
> 
> > * Gustavo F. Padovan <padovan@profusion.mobi> [2010-09-10 16:45:09 -0300]:
> >
> >> Hi Mat,
> >>
> >> * Mat Martineau <mathewm@codeaurora.org> [2010-09-10 09:53:31 -0700]:
> >>
> >>>
> >>> Gustavo -
> >>>
> >>> I'm not sure why the streaming code used to work, but this does not
> >>> look like an skb_clone() problem.  Your patch to remove the
> >>> skb_clone() call in l2cap_streaming_send() addresses the root cause of
> >>> this crash.
> >>>
> >>> On Wed, 8 Sep 2010, Gustavo F. Padovan wrote:
> >>>
> >>>> I've been experiencing some problems when running the L2CAP Streaming mode in
> >>>> 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> >>>> wasn't happening before, so I think we may have a regression here (I didn't
> >>>> find where yet). The crash log is below.
> >>>>
> >>>> The following patch does not fix the regression, but shows that removing the
> >>>> skb_clone() call from l2cap_streaming_send() we workaround the problem. The
> >>>> patch is good anyway because it saves memory and time.
> >>>>
> >>>> By now I have no idea on how to fix this.
> >>>>
> >>>> <snip>
> >>>
> >>> This has to do with the sk->sk_wmem_alloc accounting that controls the
> >>> amount of write buffer space used on the socket.
> >>>
> >>> When the L2CAP streaming mode socket segments its data, it allocates
> >>> memory using sock_alloc_send_skb() (via bt_skb_send_alloc()).  Before
> >>> that allocation call returns, skb_set_owner_w() is called on the new
> >>> skb.  This adds to sk->sk_wmem_alloc and sets skb->destructor so that
> >>> sk->sk_wmem_alloc is correctly updated when the skb is freed.
> >>>
> >>> When that skb is cloned, the clone is not "owned" by the write buffer.
> >>> The clone's destructor is set to NULL in __skb_clone().  The version
> >>> of l2cap_streaming_send() that runs out of memory is passing the
> >>> non-owned skb clone down to the HCI layer.  The original skb (the one
> >>> that's "owned by w") is immediately freed, which adjusts
> >>> sk->sk_wmem_alloc back down - the socket thinks it has unlimited write
> >>> buffer space.  As a result, bt_skb_send_alloc() never blocks waiting
> >>> for buffer space (or returns EAGAIN for nonblocking writes) and the
> >>> HCI send queue keeps growing.
> >>
> >> If the problem is what you are saying, add a skb_set_owner_w(skb, sk) on
> >> the cloned skb should solve the problem, but it doesn't. That's exactly
> >> what tcp_transmit_skb() does. Also that just appeared in 2.6.36, is was
> >> working fine before, i.e, we have a regression here.
> >>
> >>>
> >>> This isn't a problem for the ERTM sends, because the original skbs are
> >>> kept in the ERTM tx queue until they are acked.  Once they're acked,
> >>> the write buffer space is freed and additional skbs can be allocated.
> >>
> >> It affects ERTM as well, but in that case the kernel doesn't crash
> >> because ERTM block on sending trying to allocate memory. Then we are not
> >> able to receive any ack (everything stays queued in sk_backlog_queue as
> >> the sk is owned by the user) and ERTM stalls.
> >
> > By reverting
> >
> > commit 218bb9dfd21472128f86b38ad2eab123205c2991
> > Author: Gustavo F. Padovan <padovan@profusion.mobi>
> > Date:   Mon Jun 21 18:53:22 2010 -0300
> >
> >    Bluetooth: Add backlog queue to ERTM code
> >
> >    backlog queue is the canonical mechanism to avoid race conditions due
> >    interrupts in bottom half context. After the socket lock is released the
> >    net core take care of push all skb in its backlog queue.
> >
> >    Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
> >    Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
> >
> >
> > we can workaround the bug. It's not the real cause of the bug because the
> > backlog queue was working before 2.6.36, but removing the backlog queue scheme
> > make L2CAP works again and also give us more time to find the real cause of the
> > problem.
> >
> > Before  implement the backlog queue L2CAP had his own lock scheme to avoid race
> > conditions. The point is that the backlog queue adds too much serialization to
> > L2CAP, that was the cause of the ERTM bug. The old scheme (the one we are going
> > to use after revert this commit) just serialize access to some places.
> >
> > By not using the backlog queue, we can receive the acknowledgement frames more
> > quickly and then free the acked frames quickly. In fact the old scheme is
> > looks better than backlog queue.
> >
> > I think we should proceed with the revert with this commit in mainline, and at the
> > same time try to find the root cause of the problem.
> 
> I don't think the backlog patch should be reverted.  Although the old 
> code would appear to get things working again, it's only because locks 
> were being ignored in some important cases.  Maybe it seems ok most of 
> the time, but the ERTM socket state was not being properly protected. 
> The backlog is the right way to do it, but also depends on correct 
> lock handling in the rest of L2CAP.
> 
> 
> I think this is one of the locking issues from this message to 
> linux-bluetooth ("ERTM known bugs/regressions (was Re: [PATCH 0/8] 
> ...)") on August 2:
> 
> http://www.spinics.net/lists/linux-bluetooth/msg06734.html
> 
> 
> We've both arrived at the root cause already: incoming frames 
> (including acks) get stuck on the backlog queue while the socket is 
> locked waiting for memory -- and no memory gets freed until the acks 
> are processed.
> 
> The only time the socket lock is held for a long time is when waiting 
> for memory, so I think the best solution is to not hold the lock while 
> allocating.  This should be safe, since the only socket state that's 
> changed during bt_skb_send_alloc() is sk->sk_wmem_alloc, which is an 
> atomic_t.
> 
> Then some changes are needed to these functions:
> 
> l2cap_sar_segment_sdu() and l2cap_create_iframe_pdu(): Take some args 
> on the stack (like remote_mps) instead of accessing l2cap_pinfo. 
> Don't add to TX_QUEUE(sk) within these functions (let the caller do 
> that).
> 
> l2cap_sock_sendmsg(): Copy necessary data to local variables from 
> l2cap_pinfo while the lock is held.  release_sock(sk), then call 
> l2cap_sar_segment_sdu() (which will work for the "one PDU" case too) 
> with the new args for mps, fcs, etc.  After l2cap_sar_segment_sdu() 
> returns, check the socket state again (like the code at the beginning 
> of l2cap_sock_sendmsg()) and then lock_sock(sk), queue the data, and 
> send it.

If we are going to release the lock before alloc memory we are assuming
the no one will change the mps, fcs and other things we need inside
those functions, so in the big picture just add a release_lock() before
alloc memory and after that lock the socket again should work nicely and
we won't need any local variable.

However I did find any other code in net/ that release the lock before
alloc memory. ERTM is a similar (and simplified) version of TCP. TCP
uses backlog queues as well and alloc memory with the socket locked.
I'm not completely conviced that this is the better way to fix that, I'm
also looking to others solutions, but haven't figured out them yet. ;)

> 
> (I know it would be better and clearer to send a patch for this, but 
> I have some other things to fix right now!)
> 
> 
> The other locking problem (with ERTM timers) also deserves some 
> attention, but maybe in a new email thread!

Yes, we need to look to it. Let's start a new mail thread.

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-21  4:10     ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-21  4:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-bluetooth, netdev, linux-kernel, marcel

* Andrew Morton <akpm@linux-foundation.org> [2010-09-17 15:34:40 -0700]:

> On Mon, 30 Aug 2010 20:10:23 -0300
> "Gustavo F. Padovan" <padovan@profusion.mobi> wrote:
> 
> > I've been experiencing some problems when running the L2CAP Streaming mode in
> > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> > wasn't happening before, so I think we may have a regression here (I didn't find
> > where yet). The crash log is below.
> 
> So 2.6.35 was OK?
> 
> > The following patch does not fix the regression, but shows that removing the
> > skb_clone() call from l2cap_streaming_send() makes the problem goes away. The
> > patch is good anyway since it saves memory and time when sending Streaming mode
> > packets.
> 
> How does "make it go away" differ from "fix the regression" :)
> 
> But yes, if we don't understand the source of the bug then it may well
> still be there.  Did you make any progress in understanding the cause?

I did some progress and found that this really fix the bug. We are fine
here then. :)

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-21  4:10     ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-09-21  4:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	marcel-kz+m5ild9QBg9hUCZPvPmw

* Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> [2010-09-17 15:34:40 -0700]:

> On Mon, 30 Aug 2010 20:10:23 -0300
> "Gustavo F. Padovan" <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org> wrote:
> 
> > I've been experiencing some problems when running the L2CAP Streaming mode in
> > 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> > wasn't happening before, so I think we may have a regression here (I didn't find
> > where yet). The crash log is below.
> 
> So 2.6.35 was OK?
> 
> > The following patch does not fix the regression, but shows that removing the
> > skb_clone() call from l2cap_streaming_send() makes the problem goes away. The
> > patch is good anyway since it saves memory and time when sending Streaming mode
> > packets.
> 
> How does "make it go away" differ from "fix the regression" :)
> 
> But yes, if we don't understand the source of the bug then it may well
> still be there.  Did you make any progress in understanding the cause?

I did some progress and found that this really fix the bug. We are fine
here then. :)

-- 
Gustavo F. Padovan
ProFUSION embedded systems - http://profusion.mobi

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-17 22:34   ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2010-09-17 22:34 UTC (permalink / raw)
  To: Gustavo F. Padovan; +Cc: linux-bluetooth, netdev, linux-kernel, marcel

On Mon, 30 Aug 2010 20:10:23 -0300
"Gustavo F. Padovan" <padovan@profusion.mobi> wrote:

> I've been experiencing some problems when running the L2CAP Streaming mode in
> 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> wasn't happening before, so I think we may have a regression here (I didn't find
> where yet). The crash log is below.

So 2.6.35 was OK?

> The following patch does not fix the regression, but shows that removing the
> skb_clone() call from l2cap_streaming_send() makes the problem goes away. The
> patch is good anyway since it saves memory and time when sending Streaming mode
> packets.

How does "make it go away" differ from "fix the regression" :)

But yes, if we don't understand the source of the bug then it may well
still be there.  Did you make any progress in understanding the cause?

> 
> [ 5066.137533] Bluetooth: L2CAP ver 2.15
> [ 5066.137873] Bluetooth: L2CAP socket layer initialized
> [ 5066.545179] Bluetooth: RFCOMM TTY layer initialized
> [ 5066.545879] Bluetooth: RFCOMM socket layer initialized
> [ 5066.546582] Bluetooth: RFCOMM ver 1.11
> [ 5092.268021] l2test invoked oom-killer: gfp_mask=0x4d0, order=0, oom_adj=0, oom_score_adj=0
> [ 5092.268872] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
> [ 5092.269863] Call Trace:
> [ 5092.270265]  [<ffffffff8138b6a6>] ? _raw_spin_unlock+0x26/0x30
> [ 5092.270878]  [<ffffffff810c0827>] T.427+0x77/0x1e0
> [ 5092.271874]  [<ffffffff811b85e7>] ? security_real_capable_noaudit+0x37/0x60
> [ 5092.272956]  [<ffffffff810c0e3a>] out_of_memory+0x2ca/0x2f0
> [ 5092.273894]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
> [ 5092.274871]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
> [ 5092.275864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
> [ 5092.276876]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
> [ 5092.277865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.278652]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
> [ 5092.278927]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.279864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
> [ 5092.280864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
> [ 5092.281867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
> [ 5092.282865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
> [ 5092.283932]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
> [ 5092.284865]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
> [ 5092.285652]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
> [ 5092.285864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
> [ 5092.286864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
> [ 5092.287864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
> [ 5092.288872]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
> [ 5092.289927]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
> [ 5092.290864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.291649]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.291864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.292864]  [<ffffffff8130740f>] sys_send+0xf/0x20
> [ 5092.293870]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
> [ 5092.294875]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30

That's a pretty serious regression.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Possible regression with skb_clone() in 2.6.36
@ 2010-09-17 22:34   ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2010-09-17 22:34 UTC (permalink / raw)
  To: Gustavo F. Padovan
  Cc: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	marcel-kz+m5ild9QBg9hUCZPvPmw

On Mon, 30 Aug 2010 20:10:23 -0300
"Gustavo F. Padovan" <padovan-Y3ZbgMPKUGA34EUeqzHoZw@public.gmane.org> wrote:

> I've been experiencing some problems when running the L2CAP Streaming mode in
> 2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
> wasn't happening before, so I think we may have a regression here (I didn't find
> where yet). The crash log is below.

So 2.6.35 was OK?

> The following patch does not fix the regression, but shows that removing the
> skb_clone() call from l2cap_streaming_send() makes the problem goes away. The
> patch is good anyway since it saves memory and time when sending Streaming mode
> packets.

How does "make it go away" differ from "fix the regression" :)

But yes, if we don't understand the source of the bug then it may well
still be there.  Did you make any progress in understanding the cause?

> 
> [ 5066.137533] Bluetooth: L2CAP ver 2.15
> [ 5066.137873] Bluetooth: L2CAP socket layer initialized
> [ 5066.545179] Bluetooth: RFCOMM TTY layer initialized
> [ 5066.545879] Bluetooth: RFCOMM socket layer initialized
> [ 5066.546582] Bluetooth: RFCOMM ver 1.11
> [ 5092.268021] l2test invoked oom-killer: gfp_mask=0x4d0, order=0, oom_adj=0, oom_score_adj=0
> [ 5092.268872] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
> [ 5092.269863] Call Trace:
> [ 5092.270265]  [<ffffffff8138b6a6>] ? _raw_spin_unlock+0x26/0x30
> [ 5092.270878]  [<ffffffff810c0827>] T.427+0x77/0x1e0
> [ 5092.271874]  [<ffffffff811b85e7>] ? security_real_capable_noaudit+0x37/0x60
> [ 5092.272956]  [<ffffffff810c0e3a>] out_of_memory+0x2ca/0x2f0
> [ 5092.273894]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
> [ 5092.274871]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
> [ 5092.275864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
> [ 5092.276876]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
> [ 5092.277865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.278652]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
> [ 5092.278927]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.279864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
> [ 5092.280864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
> [ 5092.281867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
> [ 5092.282865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
> [ 5092.283932]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
> [ 5092.284865]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
> [ 5092.285652]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
> [ 5092.285864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
> [ 5092.286864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
> [ 5092.287864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
> [ 5092.288872]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
> [ 5092.289927]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
> [ 5092.290864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.291649]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.291864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
> [ 5092.292864]  [<ffffffff8130740f>] sys_send+0xf/0x20
> [ 5092.293870]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
> [ 5092.294875]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30

That's a pretty serious regression.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Possible regression with skb_clone() in 2.6.36
@ 2010-08-30 23:10 ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-08-30 23:10 UTC (permalink / raw)
  To: linux-bluetooth; +Cc: netdev, linux-kernel, marcel

I've been experiencing some problems when running the L2CAP Streaming mode in
2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
wasn't happening before, so I think we may have a regression here (I didn't find
where yet). The crash log is below.

The following patch does not fix the regression, but shows that removing the
skb_clone() call from l2cap_streaming_send() makes the problem goes away. The
patch is good anyway since it saves memory and time when sending Streaming mode
packets.


[ 5066.137533] Bluetooth: L2CAP ver 2.15
[ 5066.137873] Bluetooth: L2CAP socket layer initialized
[ 5066.545179] Bluetooth: RFCOMM TTY layer initialized
[ 5066.545879] Bluetooth: RFCOMM socket layer initialized
[ 5066.546582] Bluetooth: RFCOMM ver 1.11
[ 5092.268021] l2test invoked oom-killer: gfp_mask=0x4d0, order=0, oom_adj=0, oom_score_adj=0
[ 5092.268872] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
[ 5092.269863] Call Trace:
[ 5092.270265]  [<ffffffff8138b6a6>] ? _raw_spin_unlock+0x26/0x30
[ 5092.270878]  [<ffffffff810c0827>] T.427+0x77/0x1e0
[ 5092.271874]  [<ffffffff811b85e7>] ? security_real_capable_noaudit+0x37/0x60
[ 5092.272956]  [<ffffffff810c0e3a>] out_of_memory+0x2ca/0x2f0
[ 5092.273894]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
[ 5092.274871]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
[ 5092.275864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
[ 5092.276876]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
[ 5092.277865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.278652]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
[ 5092.278927]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.279864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
[ 5092.280864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
[ 5092.281867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
[ 5092.282865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
[ 5092.283932]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.284865]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
[ 5092.285652]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.285864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.286864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
[ 5092.287864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
[ 5092.288872]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
[ 5092.289927]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
[ 5092.290864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.291649]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.291864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.292864]  [<ffffffff8130740f>] sys_send+0xf/0x20
[ 5092.293870]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
[ 5092.294875]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30
[ 5092.295673] Mem-Info:
[ 5092.295863] DMA per-cpu:
[ 5092.296214] CPU    0: hi:    0, btch:   1 usd:   0
[ 5092.296925] DMA32 per-cpu:
[ 5092.297298] CPU    0: hi:   90, btch:  15 usd:  85
[ 5092.297864] active_anon:504 inactive_anon:543 isolated_anon:0
[ 5092.297865]  active_file:14 inactive_file:3 isolated_file:0
[ 5092.297866]  unevictable:0 dirty:0 writeback:3 unstable:0
[ 5092.297866]  free:728 slab_reclaimable:559 slab_unreclaimable:39895
[ 5092.297867]  mapped:12 shmem:35 pagetables:84 bounce:0
[ 5092.299944] DMA free:1064kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:14916kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 5092.301862] lowmem_reserve[]: 0 236 236 236
[ 5092.303134] DMA32 free:1848kB min:1904kB low:2380kB high:2856kB active_anon:2016kB inactive_anon:2172kB active_file:56kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:242380kB mlocked:0kB dirty:0kB writeback:12kB mapped:48kB shmem:140kB slab_reclaimable:2236kB slab_unreclaimable:144664kB kernel_stack:456kB pagetables:336kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:103 all_unreclaimable? yes
[ 5092.304962] lowmem_reserve[]: 0 0 0 0
[ 5092.306066] DMA: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1064kB
[ 5092.307862] DMA32: 10*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
[ 5092.309467] 52 total pagecache pages
[ 5092.309862] 0 pages in swap cache
[ 5092.310317] Swap cache stats: add 0, delete 0, find 0/0
[ 5092.310862] Free swap  = 0kB
[ 5092.311260] Total swap = 0kB
[ 5092.313796] 65530 pages RAM
[ 5092.313868] 22603 pages reserved
[ 5092.314324] 206 pages shared
[ 5092.314862] 39353 pages non-shared
[ 5092.315348] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[ 5092.315867] [    1]     0     1      438       19   0       0             0 init
[ 5092.316871] [ 1018]     0  1018      537       75   0     -17         -1000 udevd
[ 5092.318867] [ 2217]   101  2217      606       39   0       0             0 dbus-daemon
[ 5092.319865] [ 2825]     0  2825      481       29   0       0             0 dhcpcd
[ 5092.320865] [ 3497]     0  3497     1091       83   0       0             0 sshd
[ 5092.321969] [ 3535]     0  3535      477       31   0       0             0 agetty
[ 5092.322864] [ 3536]     0  3536      477       30   0       0             0 agetty
[ 5092.323864] [ 3537]     0  3537      477       29   0       0             0 agetty
[ 5092.325937] [ 3538]     0  3538      477       31   0       0             0 agetty
[ 5092.326865] [ 3539]     0  3539      477       31   0       0             0 agetty
[ 5092.327864] [ 3540]     0  3540      477       31   0       0             0 agetty
[ 5092.328864] [ 3541]     0  3541     1738      105   0       0             0 sshd
[ 5092.329865] [ 3545]     0  3545      745       72   0       0             0 bash
[ 5092.330864] [ 3550]     0  3550      706       43   0       0             0 screen
[ 5092.331866] [ 3551]     0  3551      772      124   0       0             0 screen
[ 5092.332965] [ 3552]     0  3552      745       77   0       0             0 bash
[ 5092.333865] [ 3557]     0  3557      745       77   0       0             0 bash
[ 5092.334867] [ 3562]     0  3562      536       75   0     -17         -1000 udevd
[ 5092.336865] [ 3567]     0  3567      536       75   0     -17         -1000 udevd
[ 5092.337865] [ 3887]     0  3887      935       54   0       0             0 bluetoothd
[ 5092.338864] [ 3896]     0  3896      458       24   0       0             0 l2test
[ 5092.339864] [ 3897]     0  3897      458       24   0       0             0 l2test
[ 5092.340933] [ 3942]     0  3942      458       29   0       0             0 l2test
[ 5092.341863] Kernel panic - not syncing: Out of memory and no killable processes...
[ 5092.341864] 
[ 5092.342864] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
[ 5092.343862] Call Trace:
[ 5092.344934]  [<ffffffff813884d9>] panic+0x8c/0x199
[ 5092.345621]  [<ffffffff810c0e54>] out_of_memory+0x2e4/0x2f0
[ 5092.345864]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
[ 5092.346894]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
[ 5092.347864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
[ 5092.348871]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
[ 5092.349865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.350628]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
[ 5092.350935]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.351864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
[ 5092.352864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
[ 5092.353867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
[ 5092.354865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
[ 5092.355864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.356864]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
[ 5092.357612]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.357931]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.358864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
[ 5092.359864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
[ 5092.360929]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
[ 5092.361866]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
[ 5092.362864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.363653]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.364868]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.365865]  [<ffffffff8130740f>] sys_send+0xf/0x20
[ 5092.366945]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
[ 5092.367865]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30


-------
Gustavo F. Padovan (1):
	      Bluetooth: Simplify L2CAP Streaming mode sending



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Possible regression with skb_clone() in 2.6.36
@ 2010-08-30 23:10 ` Gustavo F. Padovan
  0 siblings, 0 replies; 21+ messages in thread
From: Gustavo F. Padovan @ 2010-08-30 23:10 UTC (permalink / raw)
  To: linux-bluetooth-u79uwXL29TY76Z2rM5mHXA
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	marcel-kz+m5ild9QBg9hUCZPvPmw

I've been experiencing some problems when running the L2CAP Streaming mode in
2.6.36. The system quickly runs in an Out Of Memory condition and crash. That
wasn't happening before, so I think we may have a regression here (I didn't find
where yet). The crash log is below.

The following patch does not fix the regression, but shows that removing the
skb_clone() call from l2cap_streaming_send() makes the problem goes away. The
patch is good anyway since it saves memory and time when sending Streaming mode
packets.


[ 5066.137533] Bluetooth: L2CAP ver 2.15
[ 5066.137873] Bluetooth: L2CAP socket layer initialized
[ 5066.545179] Bluetooth: RFCOMM TTY layer initialized
[ 5066.545879] Bluetooth: RFCOMM socket layer initialized
[ 5066.546582] Bluetooth: RFCOMM ver 1.11
[ 5092.268021] l2test invoked oom-killer: gfp_mask=0x4d0, order=0, oom_adj=0, oom_score_adj=0
[ 5092.268872] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
[ 5092.269863] Call Trace:
[ 5092.270265]  [<ffffffff8138b6a6>] ? _raw_spin_unlock+0x26/0x30
[ 5092.270878]  [<ffffffff810c0827>] T.427+0x77/0x1e0
[ 5092.271874]  [<ffffffff811b85e7>] ? security_real_capable_noaudit+0x37/0x60
[ 5092.272956]  [<ffffffff810c0e3a>] out_of_memory+0x2ca/0x2f0
[ 5092.273894]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
[ 5092.274871]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
[ 5092.275864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
[ 5092.276876]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
[ 5092.277865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.278652]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
[ 5092.278927]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.279864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
[ 5092.280864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
[ 5092.281867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
[ 5092.282865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
[ 5092.283932]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.284865]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
[ 5092.285652]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.285864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.286864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
[ 5092.287864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
[ 5092.288872]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
[ 5092.289927]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
[ 5092.290864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.291649]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.291864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.292864]  [<ffffffff8130740f>] sys_send+0xf/0x20
[ 5092.293870]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
[ 5092.294875]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30
[ 5092.295673] Mem-Info:
[ 5092.295863] DMA per-cpu:
[ 5092.296214] CPU    0: hi:    0, btch:   1 usd:   0
[ 5092.296925] DMA32 per-cpu:
[ 5092.297298] CPU    0: hi:   90, btch:  15 usd:  85
[ 5092.297864] active_anon:504 inactive_anon:543 isolated_anon:0
[ 5092.297865]  active_file:14 inactive_file:3 isolated_file:0
[ 5092.297866]  unevictable:0 dirty:0 writeback:3 unstable:0
[ 5092.297866]  free:728 slab_reclaimable:559 slab_unreclaimable:39895
[ 5092.297867]  mapped:12 shmem:35 pagetables:84 bounce:0
[ 5092.299944] DMA free:1064kB min:124kB low:152kB high:184kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:14916kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[ 5092.301862] lowmem_reserve[]: 0 236 236 236
[ 5092.303134] DMA32 free:1848kB min:1904kB low:2380kB high:2856kB active_anon:2016kB inactive_anon:2172kB active_file:56kB inactive_file:12kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:242380kB mlocked:0kB dirty:0kB writeback:12kB mapped:48kB shmem:140kB slab_reclaimable:2236kB slab_unreclaimable:144664kB kernel_stack:456kB pagetables:336kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:103 all_unreclaimable? yes
[ 5092.304962] lowmem_reserve[]: 0 0 0 0
[ 5092.306066] DMA: 0*4kB 1*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1064kB
[ 5092.307862] DMA32: 10*4kB 2*8kB 0*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
[ 5092.309467] 52 total pagecache pages
[ 5092.309862] 0 pages in swap cache
[ 5092.310317] Swap cache stats: add 0, delete 0, find 0/0
[ 5092.310862] Free swap  = 0kB
[ 5092.311260] Total swap = 0kB
[ 5092.313796] 65530 pages RAM
[ 5092.313868] 22603 pages reserved
[ 5092.314324] 206 pages shared
[ 5092.314862] 39353 pages non-shared
[ 5092.315348] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
[ 5092.315867] [    1]     0     1      438       19   0       0             0 init
[ 5092.316871] [ 1018]     0  1018      537       75   0     -17         -1000 udevd
[ 5092.318867] [ 2217]   101  2217      606       39   0       0             0 dbus-daemon
[ 5092.319865] [ 2825]     0  2825      481       29   0       0             0 dhcpcd
[ 5092.320865] [ 3497]     0  3497     1091       83   0       0             0 sshd
[ 5092.321969] [ 3535]     0  3535      477       31   0       0             0 agetty
[ 5092.322864] [ 3536]     0  3536      477       30   0       0             0 agetty
[ 5092.323864] [ 3537]     0  3537      477       29   0       0             0 agetty
[ 5092.325937] [ 3538]     0  3538      477       31   0       0             0 agetty
[ 5092.326865] [ 3539]     0  3539      477       31   0       0             0 agetty
[ 5092.327864] [ 3540]     0  3540      477       31   0       0             0 agetty
[ 5092.328864] [ 3541]     0  3541     1738      105   0       0             0 sshd
[ 5092.329865] [ 3545]     0  3545      745       72   0       0             0 bash
[ 5092.330864] [ 3550]     0  3550      706       43   0       0             0 screen
[ 5092.331866] [ 3551]     0  3551      772      124   0       0             0 screen
[ 5092.332965] [ 3552]     0  3552      745       77   0       0             0 bash
[ 5092.333865] [ 3557]     0  3557      745       77   0       0             0 bash
[ 5092.334867] [ 3562]     0  3562      536       75   0     -17         -1000 udevd
[ 5092.336865] [ 3567]     0  3567      536       75   0     -17         -1000 udevd
[ 5092.337865] [ 3887]     0  3887      935       54   0       0             0 bluetoothd
[ 5092.338864] [ 3896]     0  3896      458       24   0       0             0 l2test
[ 5092.339864] [ 3897]     0  3897      458       24   0       0             0 l2test
[ 5092.340933] [ 3942]     0  3942      458       29   0       0             0 l2test
[ 5092.341863] Kernel panic - not syncing: Out of memory and no killable processes...
[ 5092.341864] 
[ 5092.342864] Pid: 3897, comm: l2test Not tainted 2.6.36-rc3 #5
[ 5092.343862] Call Trace:
[ 5092.344934]  [<ffffffff813884d9>] panic+0x8c/0x199
[ 5092.345621]  [<ffffffff810c0e54>] out_of_memory+0x2e4/0x2f0
[ 5092.345864]  [<ffffffff810c3d43>] __alloc_pages_nodemask+0x693/0x6b0
[ 5092.346894]  [<ffffffff810ea3e6>] cache_alloc_refill+0x2d6/0x5c0
[ 5092.347864]  [<ffffffff810ea805>] __kmalloc+0x135/0x150
[ 5092.348871]  [<ffffffff8130f2ae>] __alloc_skb+0x6e/0x150
[ 5092.349865]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.350628]  [<ffffffff8130ace2>] sock_alloc_send_pskb+0x1c2/0x320
[ 5092.350935]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.351864]  [<ffffffff81312add>] ? memcpy_fromiovec+0x6d/0x90
[ 5092.352864]  [<ffffffff8130ae50>] sock_alloc_send_skb+0x10/0x20
[ 5092.353867]  [<ffffffffa00e600f>] l2cap_create_iframe_pdu+0x9f/0x2c0 [l2cap]
[ 5092.354865]  [<ffffffffa00e84b9>] l2cap_sock_sendmsg+0x5d9/0x910 [l2cap]
[ 5092.355864]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.356864]  [<ffffffff8130725b>] sock_sendmsg+0xdb/0x100
[ 5092.357612]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.357931]  [<ffffffff8138ba3c>] ? restore_args+0x0/0x30
[ 5092.358864]  [<ffffffff813073c0>] sys_sendto+0xf0/0x130
[ 5092.359864]  [<ffffffff8138b60b>] ? _raw_spin_unlock_irq+0x2b/0x40
[ 5092.360929]  [<ffffffff81093cfd>] ? trace_hardirqs_on_caller+0x13d/0x180
[ 5092.361866]  [<ffffffff81093d4d>] ? trace_hardirqs_on+0xd/0x10
[ 5092.362864]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.363653]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.364868]  [<ffffffff810d3a00>] ? might_fault+0x40/0x90
[ 5092.365865]  [<ffffffff8130740f>] sys_send+0xf/0x20
[ 5092.366945]  [<ffffffff8132ccd6>] compat_sys_socketcall+0x146/0x1f0
[ 5092.367865]  [<ffffffff81053ee4>] sysenter_dispatch+0x7/0x30


-------
Gustavo F. Padovan (1):
	      Bluetooth: Simplify L2CAP Streaming mode sending

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-09-21  4:23 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-08 23:32 Possible regression with skb_clone() in 2.6.36 Gustavo F. Padovan
2010-09-08 23:32 ` Gustavo F. Padovan
2010-09-08 23:32 ` [PATCH] Bluetooth: Simplify L2CAP Streaming mode sending Gustavo F. Padovan
2010-09-10 10:15 ` Possible regression with skb_clone() in 2.6.36 Jarek Poplawski
2010-09-10 10:15   ` Jarek Poplawski
2010-09-10 16:53 ` Mat Martineau
2010-09-10 16:53   ` Mat Martineau
2010-09-10 19:45   ` Gustavo F. Padovan
2010-09-10 19:45     ` Gustavo F. Padovan
2010-09-15 18:42     ` Gustavo F. Padovan
2010-09-15 18:42       ` Gustavo F. Padovan
2010-09-16  0:10     ` Gustavo F. Padovan
2010-09-16  0:10       ` Gustavo F. Padovan
2010-09-16 16:40       ` Mat Martineau
2010-09-17 21:08         ` Gustavo F. Padovan
  -- strict thread matches above, loose matches on Subject: below --
2010-08-30 23:10 Gustavo F. Padovan
2010-08-30 23:10 ` Gustavo F. Padovan
2010-09-17 22:34 ` Andrew Morton
2010-09-17 22:34   ` Andrew Morton
2010-09-21  4:10   ` Gustavo F. Padovan
2010-09-21  4:10     ` Gustavo F. Padovan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.