All of lore.kernel.org
 help / color / mirror / Atom feed
* NULL pointer dereference panic in stable (2.6.33.2), amd64
@ 2010-04-11 20:38 Denys Fedorysychenko
  2010-04-11 22:35 ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: Denys Fedorysychenko @ 2010-04-11 20:38 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: Text/Plain, Size: 386 bytes --]

Hi

Got recently NULL pointer dereference
Note - it seems in 32-bit i didn't experience this issue.

Router with NAT, HFSC shapers terminated with bfifo qdiscs (some attached to 
802.1Q vlan's), userspace application using tun.
Hardware - network cards e1000e, Core 2 based architecture (two Quad Xeon), 

full message received over netconsole attached
More info can provide on request

[-- Attachment #2: BUG --]
[-- Type: text/plain, Size: 7186 bytes --]

Apr 11 23:28:52 ROUTER kernel: [21095.453138] BUG: unable to handle kernel NULL pointer dereference at (null)
Apr 11 23:28:52 ROUTER kernel: [21095.453334] IP: [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
Apr 11 23:28:52 ROUTER kernel: [21095.453522] PGD 20b247067 PUD 20b24c067 PMD 0 
Apr 11 23:28:52 ROUTER kernel: [21095.453704] Oops: 0000 [#1] SMP 
Apr 11 23:28:52 ROUTER kernel: [21095.453880] last sysfs file: /sys/devices/virtual/misc/tun/dev
Apr 11 23:28:52 ROUTER kernel: [21095.454001] CPU 0 
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Pid: 2864, comm: globax Not tainted 2.6.33.2-build-0051-64 #4         /        
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP: 0010:[<ffffffff811e5877>]  [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RSP: 0000:ffff880028203db0  EFLAGS: 00010202
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff8802087d8d00
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RDX: ffff88021d818000 RSI: 0000000000000000 RDI: ffff8802135850e8
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RBP: ffff880028203de0 R08: ffff88021c01d89c R09: ffff88021c01dc00
Apr 11 23:28:52 ROUTER kernel: [21095.454001] R10: dead000000200200 R11: dead000000100100 R12: ffff88021e212b80
Apr 11 23:28:52 ROUTER kernel: [21095.454001] R13: ffff88021da51e80 R14: ffff8802135850e8 R15: ffff88021be62000
Apr 11 23:28:52 ROUTER kernel: [21095.454001] FS:  0000000000000000(0000) GS:ffff880028200000(0063) knlGS:00000000f7e51b10
Apr 11 23:28:52 ROUTER kernel: [21095.454001] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000 CR3: 000000020b248000 CR4: 00000000000006f0
Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Process globax (pid: 2864, threadinfo ffff8802118c0000, task ffff8802182c2c20)
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Stack:
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  ffff88021d818000 ffff88021da51e80 0000000000000042 ffff88021da51e80
Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> ffff88021be62000 ffff88021be62000 ffff880028203e00 ffffffffa01c12a9
Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> 0000000000000000 ffff8802135850e8 ffff880028203e50 ffffffff811e540e
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Call Trace:
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  <IRQ> 
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811e2c89>] net_tx_action+0xc2/0x120
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81005543>] do_softirq+0x33/0x68
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81039407>] irq_exit+0x36/0x75
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  <EOI>
Apr 11 23:28:52 ROUTER kernel: [21095.454001] Code: e2 48 8b 55 d0 49 c1 e4 07 66 41 8b 86 a6 00 00 00 4c 03 a2 00 03 00 00 80 e4 cf 80 cc 20 49 8b 5c 24 08 66 41 89 86 a6 00 00 00 <48> 83 3b 00 0f 84 bb 00 00 00 4c 8d ab 9c 00 00 00 4c 89 ef e8
Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP  [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
Apr 11 23:28:52 ROUTER kernel: [21095.454001]  RSP <ffff880028203db0>
Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000
Apr 11 23:28:52 ROUTER kernel: [21095.462975] ---[ end trace 2421d995c1afd7c3 ]---
Apr 11 23:28:52 ROUTER kernel: [21095.463199] Kernel panic - not syncing: Fatal exception in interrupt
Apr 11 23:28:52 ROUTER kernel: [21095.463428] Pid: 2864, comm: globax Tainted: G      D    2.6.33.2-build-0051-64 #4
Apr 11 23:28:52 ROUTER kernel: [21095.463819] Call Trace:
Apr 11 23:28:52 ROUTER kernel: [21095.464036]  <IRQ>  [<ffffffff81259743>] panic+0xa0/0x161
Apr 11 23:28:52 ROUTER kernel: [21095.464307]  [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20
Apr 11 23:28:52 ROUTER kernel: [21095.464533]  [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c
Apr 11 23:28:52 ROUTER kernel: [21095.464757]  [<ffffffff81006651>] oops_end+0xaa/0xba
Apr 11 23:28:52 ROUTER kernel: [21095.464980]  [<ffffffff8101e653>] no_context+0x1f3/0x202
Apr 11 23:28:52 ROUTER kernel: [21095.465211]  [<ffffffff810523a4>] ? tick_program_event+0x25/0x27
Apr 11 23:28:52 ROUTER kernel: [21095.465437]  [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0
Apr 11 23:28:52 ROUTER kernel: [21095.465668]  [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5
Apr 11 23:28:52 ROUTER kernel: [21095.465907]  [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e]
Apr 11 23:28:52 ROUTER kernel: [21095.466139]  [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25
Apr 11 23:28:52 ROUTER kernel: [21095.466372]  [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e]
Apr 11 23:28:52 ROUTER kernel: [21095.466605]  [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e]
Apr 11 23:28:52 ROUTER kernel: [21095.466833]  [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10
Apr 11 23:28:52 ROUTER kernel: [21095.467064]  [<ffffffff8101eb32>] do_page_fault+0x114/0x24a
Apr 11 23:28:52 ROUTER kernel: [21095.467293]  [<ffffffff8125bc9f>] page_fault+0x1f/0x30
Apr 11 23:28:52 ROUTER kernel: [21095.467516]  [<ffffffff811e5877>] ? dev_queue_xmit+0x284/0x465
Apr 11 23:28:52 ROUTER kernel: [21095.467743]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
Apr 11 23:28:52 ROUTER kernel: [21095.468139]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
Apr 11 23:28:52 ROUTER kernel: [21095.468366]  [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b
Apr 11 23:28:52 ROUTER kernel: [21095.468591]  [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc
Apr 11 23:28:52 ROUTER kernel: [21095.468814]  [<ffffffff811e2c89>] net_tx_action+0xc2/0x120
Apr 11 23:28:52 ROUTER kernel: [21095.469042]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
Apr 11 23:28:52 ROUTER kernel: [21095.469267]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
Apr 11 23:28:52 ROUTER kernel: [21095.469491]  [<ffffffff81005543>] do_softirq+0x33/0x68
Apr 11 23:28:52 ROUTER kernel: [21095.469714]  [<ffffffff81039407>] irq_exit+0x36/0x75
Apr 11 23:28:52 ROUTER kernel: [21095.469936]  [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96
Apr 11 23:28:52 ROUTER kernel: [21095.470167]  [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-11 20:38 NULL pointer dereference panic in stable (2.6.33.2), amd64 Denys Fedorysychenko
@ 2010-04-11 22:35 ` Eric Dumazet
  2010-04-11 23:04   ` Denys Fedorysychenko
  2010-04-12  3:38   ` Krishna Kumar2
  0 siblings, 2 replies; 37+ messages in thread
From: Eric Dumazet @ 2010-04-11 22:35 UTC (permalink / raw)
  To: Denys Fedorysychenko; +Cc: netdev, Krishna Kumar, David Miller

Le dimanche 11 avril 2010 à 23:38 +0300, Denys Fedorysychenko a écrit :
> Hi
> 
> Got recently NULL pointer dereference
> Note - it seems in 32-bit i didn't experience this issue.
> 
> Router with NAT, HFSC shapers terminated with bfifo qdiscs (some attached to 
> 802.1Q vlan's), userspace application using tun.
> Hardware - network cards e1000e, Core 2 based architecture (two Quad Xeon), 
> 
> full message received over netconsole attached
> More info can provide on request
> pièce jointe document texte brut (BUG)
> Apr 11 23:28:52 ROUTER kernel: [21095.453138] BUG: unable to handle kernel NULL pointer dereference at (null)
> Apr 11 23:28:52 ROUTER kernel: [21095.453334] IP: [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
> Apr 11 23:28:52 ROUTER kernel: [21095.453522] PGD 20b247067 PUD 20b24c067 PMD 0 
> Apr 11 23:28:52 ROUTER kernel: [21095.453704] Oops: 0000 [#1] SMP 
> Apr 11 23:28:52 ROUTER kernel: [21095.453880] last sysfs file: /sys/devices/virtual/misc/tun/dev
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] CPU 0 
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Pid: 2864, comm: globax Not tainted 2.6.33.2-build-0051-64 #4         /        
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP: 0010:[<ffffffff811e5877>]  [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RSP: 0000:ffff880028203db0  EFLAGS: 00010202
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff8802087d8d00
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RDX: ffff88021d818000 RSI: 0000000000000000 RDI: ffff8802135850e8
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RBP: ffff880028203de0 R08: ffff88021c01d89c R09: ffff88021c01dc00
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] R10: dead000000200200 R11: dead000000100100 R12: ffff88021e212b80
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] R13: ffff88021da51e80 R14: ffff8802135850e8 R15: ffff88021be62000
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] FS:  0000000000000000(0000) GS:ffff880028200000(0063) knlGS:00000000f7e51b10
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000 CR3: 000000020b248000 CR4: 00000000000006f0
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Process globax (pid: 2864, threadinfo ffff8802118c0000, task ffff8802182c2c20)
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Stack:
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  ffff88021d818000 ffff88021da51e80 0000000000000042 ffff88021da51e80
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> ffff88021be62000 ffff88021be62000 ffff880028203e00 ffffffffa01c12a9
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] <0> 0000000000000000 ffff8802135850e8 ffff880028203e50 ffffffff811e540e
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Call Trace:
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  <IRQ> 
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff811e2c89>] net_tx_action+0xc2/0x120
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81005543>] do_softirq+0x33/0x68
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81039407>] irq_exit+0x36/0x75
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  <EOI>
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] Code: e2 

> 48 8b 55 d0 // mov    -0x30(%rbp),%rdx

> 49 c1 e4 07 // shl    $0x7,%r12 

>  66 41 8b 86 a6 00 00 00 // mov 0xa6(%r14),%ax

> 4c 03 a2 00 03 00 00 // add 0x0300(%rdx),%r12

> 80 e4 cf  // and    $0xcf,%ah 

> 80 cc 20 // or     $0x20,%ah

> 49 8b 5c 24 08 // mov    0x8(%r12),%rbx   rcu_dereference(txq->qdisc);

> 66 41 89 86 a6 00 00 00 // mov    %ax,0xa6(%r14) 

> <48> 83 3b 00 // cmpq   $0x0,(%rbx)

> 0f 84 bb 00 00 00 // je ...

>  4c 8d ab 9c 00 00 00 4c 89 ef e8
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] RIP  [<ffffffff811e5877>] dev_queue_xmit+0x284/0x465
> Apr 11 23:28:52 ROUTER kernel: [21095.454001]  RSP <ffff880028203db0>
> Apr 11 23:28:52 ROUTER kernel: [21095.454001] CR2: 0000000000000000
> Apr 11 23:28:52 ROUTER kernel: [21095.462975] ---[ end trace 2421d995c1afd7c3 ]---
> Apr 11 23:28:52 ROUTER kernel: [21095.463199] Kernel panic - not syncing: Fatal exception in interrupt
> Apr 11 23:28:52 ROUTER kernel: [21095.463428] Pid: 2864, comm: globax Tainted: G      D    2.6.33.2-build-0051-64 #4
> Apr 11 23:28:52 ROUTER kernel: [21095.463819] Call Trace:
> Apr 11 23:28:52 ROUTER kernel: [21095.464036]  <IRQ>  [<ffffffff81259743>] panic+0xa0/0x161
> Apr 11 23:28:52 ROUTER kernel: [21095.464307]  [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20
> Apr 11 23:28:52 ROUTER kernel: [21095.464533]  [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c
> Apr 11 23:28:52 ROUTER kernel: [21095.464757]  [<ffffffff81006651>] oops_end+0xaa/0xba
> Apr 11 23:28:52 ROUTER kernel: [21095.464980]  [<ffffffff8101e653>] no_context+0x1f3/0x202
> Apr 11 23:28:52 ROUTER kernel: [21095.465211]  [<ffffffff810523a4>] ? tick_program_event+0x25/0x27
> Apr 11 23:28:52 ROUTER kernel: [21095.465437]  [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0
> Apr 11 23:28:52 ROUTER kernel: [21095.465668]  [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5
> Apr 11 23:28:52 ROUTER kernel: [21095.465907]  [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e]
> Apr 11 23:28:52 ROUTER kernel: [21095.466139]  [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25
> Apr 11 23:28:52 ROUTER kernel: [21095.466372]  [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e]
> Apr 11 23:28:52 ROUTER kernel: [21095.466605]  [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e]
> Apr 11 23:28:52 ROUTER kernel: [21095.466833]  [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10
> Apr 11 23:28:52 ROUTER kernel: [21095.467064]  [<ffffffff8101eb32>] do_page_fault+0x114/0x24a
> Apr 11 23:28:52 ROUTER kernel: [21095.467293]  [<ffffffff8125bc9f>] page_fault+0x1f/0x30
> Apr 11 23:28:52 ROUTER kernel: [21095.467516]  [<ffffffff811e5877>] ? dev_queue_xmit+0x284/0x465
> Apr 11 23:28:52 ROUTER kernel: [21095.467743]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
> Apr 11 23:28:52 ROUTER kernel: [21095.468139]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
> Apr 11 23:28:52 ROUTER kernel: [21095.468366]  [<ffffffff811f6482>] sch_direct_xmit+0x5d/0x16b
> Apr 11 23:28:52 ROUTER kernel: [21095.468591]  [<ffffffff811f664c>] __qdisc_run+0xbc/0xdc
> Apr 11 23:28:52 ROUTER kernel: [21095.468814]  [<ffffffff811e2c89>] net_tx_action+0xc2/0x120
> Apr 11 23:28:52 ROUTER kernel: [21095.469042]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
> Apr 11 23:28:52 ROUTER kernel: [21095.469267]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
> Apr 11 23:28:52 ROUTER kernel: [21095.469491]  [<ffffffff81005543>] do_softirq+0x33/0x68
> Apr 11 23:28:52 ROUTER kernel: [21095.469714]  [<ffffffff81039407>] irq_exit+0x36/0x75
> Apr 11 23:28:52 ROUTER kernel: [21095.469936]  [<ffffffff81016f63>] smp_apic_timer_interrupt+0x88/0x96
> Apr 11 23:28:52 ROUTER kernel: [21095.470167]  [<ffffffff81003293>] apic_timer_interrupt+0x13/0x20

Hi Denys !

txq->qdisc is NULL at this point, I dont think its even possible, so my
random guess is dev_pick_tx() got an queue_index >
dev->real_num_tx_queues


        txq = dev_pick_tx(dev, skb);
        q = rcu_dereference(txq->qdisc);  // q = NULL

#ifdef CONFIG_NET_CLS_ACT
        skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
#endif
        if (q->enqueue) {  // crash in dereference
                rc = __dev_xmit_skb(skb, q, dev, txq);
                goto out;
        }


I believe the following lines from dev_pick_tx() are not the problem :

	if (sk && sk->sk_dst_cache) 
		sk_tx_queue_set(sk, queue_index);

It is IMHO not safe, because route for this socket might have just
changed and we are transmitting an old packet (queued some milli seconds
before, when route was different).

We then memorize a queue_index that might be too big for the new device
of new selected route.

Next packet we want to transmit will take the cached value of
queue_index, correct for old device, maybe not correct for new device.

You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe

commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
Author: Krishna Kumar <krkumar2@in.ibm.com>
Date:   Mon Oct 19 23:50:07 2009 +0000

    net: Use sk_tx_queue_mapping for connected sockets
    
    For connected sockets, the first run of dev_pick_tx saves the
    calculated txq in sk_tx_queue_mapping. This is not saved if
    either the device has a queue select or the socket is not
    connected. Next iterations of dev_pick_tx uses the cached value
    of sk_tx_queue_mapping.
    
    Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/dev.c b/net/core/dev.c
index 28b0b9e..fa88dcd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash);
 static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 					struct sk_buff *skb)
 {
-	const struct net_device_ops *ops = dev->netdev_ops;
-	u16 queue_index = 0;
+	u16 queue_index;
+	struct sock *sk = skb->sk;
+
+	if (sk_tx_queue_recorded(sk)) {
+		queue_index = sk_tx_queue_get(sk);
+	} else {
+		const struct net_device_ops *ops = dev->netdev_ops;
 
-	if (ops->ndo_select_queue)
-		queue_index = ops->ndo_select_queue(dev, skb);
-	else if (dev->real_num_tx_queues > 1)
-		queue_index = skb_tx_hash(dev, skb);
+		if (ops->ndo_select_queue) {
+			queue_index = ops->ndo_select_queue(dev, skb);
+		} else {
+			queue_index = 0;
+			if (dev->real_num_tx_queues > 1)
+				queue_index = skb_tx_hash(dev, skb);
+
+			if (sk && sk->sk_dst_cache)
+				sk_tx_queue_set(sk, queue_index);
+		}
+	}
 
 	skb_set_queue_mapping(skb, queue_index);
 	return netdev_get_tx_queue(dev, queue_index);



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-11 22:35 ` Eric Dumazet
@ 2010-04-11 23:04   ` Denys Fedorysychenko
  2010-04-11 23:11     ` Eric Dumazet
  2010-04-12  3:38   ` Krishna Kumar2
  1 sibling, 1 reply; 37+ messages in thread
From: Denys Fedorysychenko @ 2010-04-11 23:04 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Krishna Kumar, David Miller

> It is IMHO not safe, because route for this socket might have just
> changed and we are transmitting an old packet (queued some milli seconds
> before, when route was different).
> 
> We then memorize a queue_index that might be too big for the new device
> of new selected route.
> 
> Next packet we want to transmit will take the cached value of
> queue_index, correct for old device, maybe not correct for new device.
Yes, it is possible, i have there RIP with 1k+ routes, changing non-stop.

> 
> You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> 
> commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
I will try to revert it now. But to trigger bug probably i need 1-2 days.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-11 23:04   ` Denys Fedorysychenko
@ 2010-04-11 23:11     ` Eric Dumazet
  2010-04-11 23:36       ` Denys Fedorysychenko
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-11 23:11 UTC (permalink / raw)
  To: Denys Fedorysychenko; +Cc: netdev, Krishna Kumar, David Miller

Le lundi 12 avril 2010 à 02:04 +0300, Denys Fedorysychenko a écrit :
> > It is IMHO not safe, because route for this socket might have just
> > changed and we are transmitting an old packet (queued some milli seconds
> > before, when route was different).
> > 
> > We then memorize a queue_index that might be too big for the new device
> > of new selected route.
> > 
> > Next packet we want to transmit will take the cached value of
> > queue_index, correct for old device, maybe not correct for new device.
> Yes, it is possible, i have there RIP with 1k+ routes, changing non-stop.
> 
> > 
> > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> > 
> > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> I will try to revert it now. But to trigger bug probably i need 1-2 days.

I can try to reproduce bug here, with a multiqueue device, a non
multiqueue device, and changing routes while transmitting bulk of
packets, and of course some trafic shaping to slow down xmits.

Could you send :

ifconfig -a
lspci

Thanks



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-11 23:11     ` Eric Dumazet
@ 2010-04-11 23:36       ` Denys Fedorysychenko
  0 siblings, 0 replies; 37+ messages in thread
From: Denys Fedorysychenko @ 2010-04-11 23:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Krishna Kumar, David Miller

On Monday 12 April 2010 02:11:23 Eric Dumazet wrote:
> Le lundi 12 avril 2010 à 02:04 +0300, Denys Fedorysychenko a écrit :
> > > It is IMHO not safe, because route for this socket might have just
> > > changed and we are transmitting an old packet (queued some milli
> > > seconds before, when route was different).
> > >
> > > We then memorize a queue_index that might be too big for the new device
> > > of new selected route.
> > >
> > > Next packet we want to transmit will take the cached value of
> > > queue_index, correct for old device, maybe not correct for new device.
> >
> > Yes, it is possible, i have there RIP with 1k+ routes, changing non-stop.
> >
> > > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> > >
> > > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> >
> > I will try to revert it now. But to trigger bug probably i need 1-2 days.
> 
> I can try to reproduce bug here, with a multiqueue device, a non
> multiqueue device, and changing routes while transmitting bulk of
> packets, and of course some trafic shaping to slow down xmits.
> 
> Could you send :
> 
> ifconfig -a
> lspci
A bit more even, hopefully it is useful.
Additional note - i have few blackhole routes, so if specific route announce 
will disappear - it will be less specific blackhole route.

I have also some source routing (ip rule).


lspci

00:00.0 Host bridge: Intel Corporation 5000P Chipset Memory Controller Hub 
(rev b1)
00:02.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 
2 (rev b1)
00:03.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 
3 (rev b1)
00:04.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 
4-5 (rev b1)
00:05.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 
5 (rev b1)
00:06.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x8 Port 
6-7 (rev b1)
00:07.0 PCI bridge: Intel Corporation 5000 Series Chipset PCI Express x4 Port 
7 (rev b1)
00:10.0 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:10.1 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:10.2 Host bridge: Intel Corporation 5000 Series Chipset FSB Registers (rev 
b1)
00:11.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers 
(rev b1)
00:13.0 Host bridge: Intel Corporation 5000 Series Chipset Reserved Registers 
(rev b1)
00:15.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 
b1)
00:16.0 Host bridge: Intel Corporation 5000 Series Chipset FBD Registers (rev 
b1)
00:1c.0 PCI bridge: Intel Corporation 631xESB/632xESB/3100 Chipset PCI Express 
Root Port 1 (rev 09)
00:1d.0 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI 
USB Controller #1 (rev 09)
00:1d.1 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI 
USB Controller #2 (rev 09)
00:1d.2 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI 
USB Controller #3 (rev 09)
00:1d.3 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset UHCI 
USB Controller #4 (rev 09)
00:1d.7 USB Controller: Intel Corporation 631xESB/632xESB/3100 Chipset EHCI 
USB2 Controller (rev 09)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev d9)
00:1f.0 ISA bridge: Intel Corporation 631xESB/632xESB/3100 Chipset LPC 
Interface Controller (rev 09)
00:1f.1 IDE interface: Intel Corporation 631xESB/632xESB IDE Controller (rev 
09)
00:1f.2 SATA controller: Intel Corporation 631xESB/632xESB SATA AHCI 
Controller (rev 09)
00:1f.3 SMBus: Intel Corporation 631xESB/632xESB/3100 Chipset SMBus Controller 
(rev 09)
01:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Upstream 
Port (rev 01)
01:00.3 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express to PCI-X 
Bridge (rev 01)
02:00.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E1 (rev 01)
02:02.0 PCI bridge: Intel Corporation 6311ESB/6321ESB PCI Express Downstream 
Port E3 (rev 01)
04:00.0 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet 
Controller (Copper) (rev 01)
04:00.1 Ethernet controller: Intel Corporation 80003ES2LAN Gigabit Ethernet 
Controller (Copper) (rev 01)
07:00.0 RAID bus controller: Adaptec AAC-RAID (rev 09)
0b:00.0 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
0b:00.1 Ethernet controller: Intel Corporation 82571EB Gigabit Ethernet 
Controller (rev 06)
0c:05.0 VGA compatible controller: ASPEED Technology, Inc. AST2000

eth0      Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:194.146.153.17  Bcast:0.0.0.0  Mask:255.255.255.224                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:1047262147 errors:0 dropped:0 overruns:0 frame:0                                                                                                            
          TX packets:1017399476 errors:0 dropped:0 overruns:0 carrier:0                                                                                                          
          collisions:0 txqueuelen:1000                                                                                                                                           
          RX bytes:698944316059 (650.9 GiB)  TX bytes:694648889407 (646.9 GiB)                                                                                                   
          Memory:fc6e0000-fc700000                                                                                                                                               
                                                                                                                                                                                 
eth0.165  Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:172.30.1.2  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:5375126 errors:0 dropped:0 overruns:0 frame:0                                                                                                               
          TX packets:3446299 errors:0 dropped:2 overruns:0 carrier:0                                                                                                             
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:1910233297 (1.7 GiB)  TX bytes:814740452 (776.9 MiB)                                                                                                          
                                                                                                                                                                                 
eth0.32   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.22.22.2  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:96167917 errors:0 dropped:0 overruns:0 frame:0                                                                                                              
          TX packets:166845841 errors:0 dropped:2 overruns:0 carrier:0                                                                                                           
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:14589314556 (13.5 GiB)  TX bytes:126771970521 (118.0 GiB)                                                                                                     
                                                                                                                                                                                 
eth0.33   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.152.152.2  Bcast:0.0.0.0  Mask:255.255.255.248                                                                                                            
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:10171808 errors:0 dropped:0 overruns:0 frame:0                                                                                                              
          TX packets:130698572 errors:0 dropped:0 overruns:0 carrier:0                                                                                                           
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:2690522850 (2.5 GiB)  TX bytes:25223310290 (23.4 GiB)                                                                                                         
                                                                                                                                                                                 
eth0.34   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.25.25.1  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:133415723 errors:0 dropped:0 overruns:0 frame:0                                                                                                             
          TX packets:1352 errors:0 dropped:1 overruns:0 carrier:0                                                                                                                
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:140007983122 (130.3 GiB)  TX bytes:117870 (115.1 KiB)                                                                                                         
                                                                                                                                                                                 
eth0.35   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
                                                                                                                                                                                 
eth0.36   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:1474 errors:0 dropped:0 overruns:0 frame:0                                                                                                                  
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:125741 (122.7 KiB)  TX bytes:0 (0.0 B)                                                                                                                        
                                                                                                                                                                                 
eth0.38   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.0.4.1  Bcast:0.0.0.0  Mask:255.255.255.248                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:674 errors:0 dropped:0 overruns:0 frame:0                                                                                                                   
          TX packets:188760217 errors:0 dropped:0 overruns:0 carrier:0                                                                                                           
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:31004 (30.2 KiB)  TX bytes:212060676669 (197.4 GiB)                                                                                                           
                                                                                                                                                                                 
eth0.39   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:10.22.24.1  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                                
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:270892875 errors:0 dropped:0 overruns:0 frame:0                                                                                                             
          TX packets:309713487 errors:0 dropped:3 overruns:0 carrier:0                                                                                                           
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:77224626928 (71.9 GiB)  TX bytes:215883462565 (201.0 GiB)                                                                                                     
                                                                                                                                                                                 
eth0.40   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:62.84.94.13  Bcast:0.0.0.0  Mask:255.255.255.248                                                                                                             
          BROADCAST MULTICAST  MTU:1500  Metric:1                                                                                                                                
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
                                                                                                                                                                                 
eth0.41   Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CC                                                                                                                          
          inet addr:2.2.3.1  Bcast:0.0.0.0  Mask:255.255.255.128                                                                                                                 
          BROADCAST MULTICAST  MTU:1500  Metric:1                                                                                                                                
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
                                                                                                                                                                                 
eth1      Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CD                                                                                                                          
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1                                                                                                             
          RX packets:949912 errors:0 dropped:0 overruns:0 frame:0                                                                                                                
          TX packets:192 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                 
          collisions:0 txqueuelen:1000                                                                                                                                           
          RX bytes:429565276 (409.6 MiB)  TX bytes:8064 (7.8 KiB)                                                                                                                
          Memory:fc6c0000-fc6e0000                                                                                                                                               
                                                                                                                                                                                 
eth1.2    Link encap:Ethernet  HWaddr 00:1C:C0:19:23:80                                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:5462 errors:0 dropped:0 overruns:0 frame:0                                                                                                                  
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:273100 (266.6 KiB)  TX bytes:0 (0.0 B)                                                                                                                        
                                                                                                                                                                                 
eth1.3    Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CD                                                                                                                          
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:5462 errors:0 dropped:0 overruns:0 frame:0                                                                                                                  
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:273100 (266.6 KiB)  TX bytes:0 (0.0 B)                                                                                                                        
                                                                                                                                                                                 
eth1.4    Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CD                                                                                                                          
          inet addr:10.154.154.2  Bcast:0.0.0.0  Mask:255.255.255.0                                                                                                              
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1                                                                                                                     
          RX packets:5461 errors:0 dropped:0 overruns:0 frame:0                                                                                                                  
          TX packets:192 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                 
          collisions:0 txqueuelen:0                                                                                                                                              
          RX bytes:273050 (266.6 KiB)  TX bytes:8064 (7.8 KiB)                                                                                                                   
                                                                                                                                                                                 
eth2      Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CE                                                                                                                          
          BROADCAST MULTICAST  MTU:1500  Metric:1                                                                                                                                
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:1000                                                                                                                                           
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
          Memory:fcde0000-fce00000                                                                                                                                               
                                                                                                                                                                                 
eth3      Link encap:Ethernet  HWaddr 00:1E:68:57:4B:CF                                                                                                                          
          BROADCAST MULTICAST  MTU:1500  Metric:1                                                                                                                                
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0                                                                                                                     
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0                                                                                                                   
          collisions:0 txqueuelen:1000                                                                                                                                           
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)                                                                                                                                 
          Memory:fcd80000-fcda0000                                                                                                                                               
                                                                                                                                                                                 
ifb0      Link encap:Ethernet  HWaddr 6A:73:52:ED:4A:D6                                                                                                                          
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1                                                                                                                         
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:32
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ifb1      Link encap:Ethernet  HWaddr C2:97:7A:32:8B:59
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:32
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ifb2      Link encap:Ethernet  HWaddr 96:AC:DD:0D:68:B0
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:32
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ifb3      Link encap:Ethernet  HWaddr FA:D9:59:C6:16:E7
          UP BROADCAST RUNNING NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:32
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:3192 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3192 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:222919 (217.6 KiB)  TX bytes:222919 (217.6 KiB)

tunx0     Link encap:UNSPEC  HWaddr 
00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          inet addr:1.1.1.2  P-t-P:1.1.1.2  Mask:255.255.255.255
          UP POINTOPOINT RUNNING  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12531732 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:0 (0.0 B)  TX bytes:1888404300 (1.7 GiB)

Router-Dora ~ # tc -d qdisc
qdisc hfsc 1: dev eth0 root refcnt 2
qdisc bfifo 101: dev eth0 parent 1:101 limit 12500000b
qdisc bfifo 111: dev eth0 parent 1:111 limit 3125Kb
qdisc bfifo 112: dev eth0 parent 1:112 limit 3125Kb
qdisc pfifo_fast 0: dev eth1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc hfsc 1: dev eth0.33 root refcnt 2
 linklayer ethernet overhead 4 mpu 64 mtu 1500 tsize 1436
qdisc bfifo 110: dev eth0.33 parent 1:110 limit 3000000b
qdisc bfifo 120: dev eth0.33 parent 1:120 limit 100000b
qdisc bfifo 130: dev eth0.33 parent 1:130 limit 100000b
qdisc bfifo 140: dev eth0.33 parent 1:140 limit 100000b
qdisc bfifo 150: dev eth0.33 parent 1:150 limit 1000000b
qdisc bfifo 160: dev eth0.33 parent 1:160 limit 100000b
qdisc bfifo 170: dev eth0.33 parent 1:170 limit 100000b
qdisc bfifo 180: dev eth0.33 parent 1:180 limit 100000b
qdisc bfifo 190: dev eth0.33 parent 1:190 limit 100000b
qdisc hfsc 1: dev eth0.38 root refcnt 2
 linklayer ethernet overhead 4 mpu 64 mtu 2047 tsize 512
qdisc bfifo 3: dev eth0.38 parent 1:3 limit 5000000b
qdisc bfifo 4: dev eth0.38 parent 1:4 limit 5000000b
qdisc hfsc 1: dev eth0.39 root refcnt 2
 linklayer ethernet overhead 4 mpu 64 mtu 2047 tsize 512
qdisc bfifo 3: dev eth0.39 parent 1:3 limit 3000000b
qdisc bfifo 20: dev eth0.39 parent 1:20 limit 20000000b
qdisc bfifo 31: dev eth0.39 parent 1:31 limit 1000000b
qdisc bfifo 30: dev eth0.39 parent 1:30 limit 5000000b
qdisc ingress ffff: dev eth1.3 parent ffff:fff1 ----------------
qdisc pfifo_fast 0: dev ifb0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev ifb1 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev ifb2 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev ifb3 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1
qdisc pfifo_fast 0: dev tunx0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 
1 1 1 1 1 1 1


Router-Dora ~ # ip link                                                                                                                                                          
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN                                                                                                              
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00                                                                                                                        
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state UP qlen 
1000                                                                                                
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff                                                                                                                           
3: eth1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast 
state UP qlen 1000
    link/ether 00:1e:68:57:4b:cd brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:1e:68:57:4b:ce brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000
    link/ether 00:1e:68:57:4b:cf brd ff:ff:ff:ff:ff:ff
6: eth0.32@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
7: eth0.33@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state 
UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
8: eth0.34@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
9: eth0.35@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
10: eth0.36@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
11: eth0.38@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state 
UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
12: eth0.39@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc hfsc state 
UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
13: eth0.40@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
14: eth0.41@eth0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
15: eth0.165@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cc brd ff:ff:ff:ff:ff:ff
16: eth1.2@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1c:c0:19:23:80 brd ff:ff:ff:ff:ff:ff
17: eth1.3@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cd brd ff:ff:ff:ff:ff:ff
18: eth1.4@eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UP
    link/ether 00:1e:68:57:4b:cd brd ff:ff:ff:ff:ff:ff
19: ifb0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 32
    link/ether 6a:73:52:ed:4a:d6 brd ff:ff:ff:ff:ff:ff
20: ifb1: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 32
    link/ether c2:97:7a:32:8b:59 brd ff:ff:ff:ff:ff:ff
21: ifb2: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 32
    link/ether 96:ac:dd:0d:68:b0 brd ff:ff:ff:ff:ff:ff
22: ifb3: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state 
UNKNOWN qlen 32
    link/ether fa:d9:59:c6:16:e7 brd ff:ff:ff:ff:ff:ff
23: tunx0: <POINTOPOINT,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN 
qlen 1024
    link/[65534]





> 
> Thanks
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-11 22:35 ` Eric Dumazet
  2010-04-11 23:04   ` Denys Fedorysychenko
@ 2010-04-12  3:38   ` Krishna Kumar2
  2010-04-12  6:01     ` Eric Dumazet
  1 sibling, 1 reply; 37+ messages in thread
From: Krishna Kumar2 @ 2010-04-12  3:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Denys Fedorysychenko

Hi Eric,

Eric Dumazet <eric.dumazet@gmail.com> wrote on 04/12/2010 04:05:53 AM:

> I believe the following lines from dev_pick_tx() are not the problem :
>
>    if (sk && sk->sk_dst_cache)
>       sk_tx_queue_set(sk, queue_index);
>
> It is IMHO not safe, because route for this socket might have just
> changed and we are transmitting an old packet (queued some milli seconds
> before, when route was different).
>
> We then memorize a queue_index that might be too big for the new device
> of new selected route.
>
> Next packet we want to transmit will take the cached value of
> queue_index, correct for old device, maybe not correct for new device.

When route changes, I think my patch had reset sk->sk_tx_queue_mapping
by calling sk_tx_queue_clear. I don't know if I missed any path where
the route changes and sk_dst_reset() was not called.

The following might be better to prove the panic is due to this, since
your suggestion will hide a panic that happens somewhat rare (according
to Denys):

      if (sk_tx_queue_recorded(sk)) {
            queue_index = sk_tx_queue_get(sk);
+           queue_index = dev_cap_txqueue(dev, queue_index);
      } else {

Thanks,

- KK

> You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
>
> commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> Author: Krishna Kumar <krkumar2@in.ibm.com>
> Date:   Mon Oct 19 23:50:07 2009 +0000
>
>     net: Use sk_tx_queue_mapping for connected sockets
>
>     For connected sockets, the first run of dev_pick_tx saves the
>     calculated txq in sk_tx_queue_mapping. This is not saved if
>     either the device has a queue select or the socket is not
>     connected. Next iterations of dev_pick_tx uses the cached value
>     of sk_tx_queue_mapping.
>
>     Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 28b0b9e..fa88dcd 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash);
>  static struct netdev_queue *dev_pick_tx(struct net_device *dev,
>                 struct sk_buff *skb)
>  {
> -   const struct net_device_ops *ops = dev->netdev_ops;
> -   u16 queue_index = 0;
> +   u16 queue_index;
> +   struct sock *sk = skb->sk;
> +
> +   if (sk_tx_queue_recorded(sk)) {
> +      queue_index = sk_tx_queue_get(sk);
> +   } else {
> +      const struct net_device_ops *ops = dev->netdev_ops;
>
> -   if (ops->ndo_select_queue)
> -      queue_index = ops->ndo_select_queue(dev, skb);
> -   else if (dev->real_num_tx_queues > 1)
> -      queue_index = skb_tx_hash(dev, skb);
> +      if (ops->ndo_select_queue) {
> +         queue_index = ops->ndo_select_queue(dev, skb);
> +      } else {
> +         queue_index = 0;
> +         if (dev->real_num_tx_queues > 1)
> +            queue_index = skb_tx_hash(dev, skb);
> +
> +         if (sk && sk->sk_dst_cache)
> +            sk_tx_queue_set(sk, queue_index);
> +      }
> +   }
>
>     skb_set_queue_mapping(skb, queue_index);
>     return netdev_get_tx_queue(dev, queue_index);
>
>


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-12  3:38   ` Krishna Kumar2
@ 2010-04-12  6:01     ` Eric Dumazet
  2010-04-12  7:18       ` Eric Dumazet
  2010-04-12  7:54       ` NULL pointer dereference panic in stable (2.6.33.2), amd64 Krishna Kumar2
  0 siblings, 2 replies; 37+ messages in thread
From: Eric Dumazet @ 2010-04-12  6:01 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: David Miller, netdev, Denys Fedorysychenko

Le lundi 12 avril 2010 à 09:08 +0530, Krishna Kumar2 a écrit :
> Hi Eric,
> 
> Eric Dumazet <eric.dumazet@gmail.com> wrote on 04/12/2010 04:05:53 AM:
> 
> > I believe the following lines from dev_pick_tx() are not the problem :
> >
> >    if (sk && sk->sk_dst_cache)
> >       sk_tx_queue_set(sk, queue_index);
> >
> > It is IMHO not safe, because route for this socket might have just
> > changed and we are transmitting an old packet (queued some milli seconds
> > before, when route was different).
> >
> > We then memorize a queue_index that might be too big for the new device
> > of new selected route.
> >
> > Next packet we want to transmit will take the cached value of
> > queue_index, correct for old device, maybe not correct for new device.
> 
> When route changes, I think my patch had reset sk->sk_tx_queue_mapping
> by calling sk_tx_queue_clear. I don't know if I missed any path where
> the route changes and sk_dst_reset() was not called.
> 

Problem is when you reset sk->sk_tx_queue_mapping at the very moment
route (or destination) changes, we might have old packets queued in tx
queues, of the old ethernet device (eth0 : multi queue compatable)

2) Application does a sendmsg() or connect() call and sk->sk_dst_cache
is rebuild, it points to a dst_entry referring a new device (eth1 : non
multiqueue)

3) When one old packet finally is transmitted, we do :

	queue_index = 1; // any value > 0

	if (sk && sk->sk_dst_cache)
		sk_tx_queue_set(sk, queue_index); // remember a >0 value

4) application does a sendmsg(), enqueues a new skb on eth1

5) We re-enter dev_pick_tx(), and consider cached value in 3) is valid.
   we pick a non existent txq for eth1 device.

6) We crash.


> The following might be better to prove the panic is due to this, since
> your suggestion will hide a panic that happens somewhat rare (according
> to Denys):
> 
>       if (sk_tx_queue_recorded(sk)) {
>             queue_index = sk_tx_queue_get(sk);
> +           queue_index = dev_cap_txqueue(dev, queue_index);
>       } else {
> 

Sure, but I thought I was clear enough to prove this commit was wrong,
and we have to find a fix.

> Thanks,
> 
> - KK
> 
> > You could try to revert commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> >
> > commit a4ee3ce3293dc931fab19beb472a8bde1295aebe
> > Author: Krishna Kumar <krkumar2@in.ibm.com>
> > Date:   Mon Oct 19 23:50:07 2009 +0000
> >
> >     net: Use sk_tx_queue_mapping for connected sockets
> >
> >     For connected sockets, the first run of dev_pick_tx saves the
> >     calculated txq in sk_tx_queue_mapping. This is not saved if
> >     either the device has a queue select or the socket is not
> >     connected. Next iterations of dev_pick_tx uses the cached value
> >     of sk_tx_queue_mapping.
> >
> >     Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 28b0b9e..fa88dcd 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -1791,13 +1791,25 @@ EXPORT_SYMBOL(skb_tx_hash);
> >  static struct netdev_queue *dev_pick_tx(struct net_device *dev,
> >                 struct sk_buff *skb)
> >  {
> > -   const struct net_device_ops *ops = dev->netdev_ops;
> > -   u16 queue_index = 0;
> > +   u16 queue_index;
> > +   struct sock *sk = skb->sk;
> > +
> > +   if (sk_tx_queue_recorded(sk)) {
> > +      queue_index = sk_tx_queue_get(sk);
> > +   } else {
> > +      const struct net_device_ops *ops = dev->netdev_ops;
> >
> > -   if (ops->ndo_select_queue)
> > -      queue_index = ops->ndo_select_queue(dev, skb);
> > -   else if (dev->real_num_tx_queues > 1)
> > -      queue_index = skb_tx_hash(dev, skb);
> > +      if (ops->ndo_select_queue) {
> > +         queue_index = ops->ndo_select_queue(dev, skb);
> > +      } else {
> > +         queue_index = 0;
> > +         if (dev->real_num_tx_queues > 1)
> > +            queue_index = skb_tx_hash(dev, skb);
> > +
> > +         if (sk && sk->sk_dst_cache)
> > +            sk_tx_queue_set(sk, queue_index);
> > +      }
> > +   }
> >
> >     skb_set_queue_mapping(skb, queue_index);
> >     return netdev_get_tx_queue(dev, queue_index);
> >
> >
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-12  6:01     ` Eric Dumazet
@ 2010-04-12  7:18       ` Eric Dumazet
  2010-04-12  7:36         ` David Miller
  2010-04-15  6:52         ` David Miller
  2010-04-12  7:54       ` NULL pointer dereference panic in stable (2.6.33.2), amd64 Krishna Kumar2
  1 sibling, 2 replies; 37+ messages in thread
From: Eric Dumazet @ 2010-04-12  7:18 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: David Miller, netdev, Denys Fedorysychenko

Le lundi 12 avril 2010 à 08:01 +0200, Eric Dumazet a écrit :

> Problem is when you reset sk->sk_tx_queue_mapping at the very moment
> route (or destination) changes, we might have old packets queued in tx
> queues, of the old ethernet device (eth0 : multi queue compatable)
> 
> 2) Application does a sendmsg() or connect() call and sk->sk_dst_cache
> is rebuild, it points to a dst_entry referring a new device (eth1 : non
> multiqueue)
> 
> 3) When one old packet finally is transmitted, we do :
> 
> 	queue_index = 1; // any value > 0
> 
> 	if (sk && sk->sk_dst_cache)
> 		sk_tx_queue_set(sk, queue_index); // remember a >0 value
> 
> 4) application does a sendmsg(), enqueues a new skb on eth1
> 
> 5) We re-enter dev_pick_tx(), and consider cached value in 3) is valid.
>    we pick a non existent txq for eth1 device.
> 
> 6) We crash.
> 
> 
> > The following might be better to prove the panic is due to this, since
> > your suggestion will hide a panic that happens somewhat rare (according
> > to Denys):
> > 
> >       if (sk_tx_queue_recorded(sk)) {
> >             queue_index = sk_tx_queue_get(sk);
> > +           queue_index = dev_cap_txqueue(dev, queue_index);
> >       } else {
> > 
> 
> Sure, but I thought I was clear enough to prove this commit was wrong,
> and we have to find a fix.

Here the patch I cooked, this is a stable candidate, once tested and
acknowledged.

[PATCH] net: dev_pick_tx() fix

When dev_pick_tx() caches tx queue_index on a socket, we must check
socket dst_entry matches skb one, or risk a crash later, as reported by
Denys Fedorysychenko, if old packets are in flight during a route
change, involving devices with different number of queues.

Bug introduced by commit a4ee3ce3
(net: Use sk_tx_queue_mapping for connected sockets)

Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/core/dev.c b/net/core/dev.c
index 1c8a0ce..92584bf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1989,8 +1989,12 @@ static struct netdev_queue *dev_pick_tx(struct net_device *dev,
 			if (dev->real_num_tx_queues > 1)
 				queue_index = skb_tx_hash(dev, skb);
 
-			if (sk && sk->sk_dst_cache)
-				sk_tx_queue_set(sk, queue_index);
+			if (sk) {
+				struct dst_entry *dst = rcu_dereference(sk->sk_dst_cache);
+
+				if (dst && skb_dst(skb) == dst)
+					sk_tx_queue_set(sk, queue_index);
+			}
 		}
 	}
 



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-12  7:18       ` Eric Dumazet
@ 2010-04-12  7:36         ` David Miller
  2010-04-15  6:52         ` David Miller
  1 sibling, 0 replies; 37+ messages in thread
From: David Miller @ 2010-04-12  7:36 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 12 Apr 2010 09:18:17 +0200

> Here the patch I cooked, this is a stable candidate, once tested and
> acknowledged.
> 
> [PATCH] net: dev_pick_tx() fix
> 
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> 
> Bug introduced by commit a4ee3ce3
> (net: Use sk_tx_queue_mapping for connected sockets)
> 
> Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good Eric.  I'll integrate this into net-2.6 and also queue it
up for -stable in a day or two unless some problem is discovered.

Thanks!

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-12  6:01     ` Eric Dumazet
  2010-04-12  7:18       ` Eric Dumazet
@ 2010-04-12  7:54       ` Krishna Kumar2
  2010-04-12  9:31         ` Eric Dumazet
  1 sibling, 1 reply; 37+ messages in thread
From: Krishna Kumar2 @ 2010-04-12  7:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, Denys Fedorysychenko

Hi Eric,

Thanks for your patch, just one question on it though:

Eric Dumazet <eric.dumazet@gmail.com> wrote on 04/12/2010 11:31:51 AM:

> > When route changes, I think my patch had reset sk->sk_tx_queue_mapping
> > by calling sk_tx_queue_clear. I don't know if I missed any path where
> > the route changes and sk_dst_reset() was not called.
> >
>
> Problem is when you reset sk->sk_tx_queue_mapping at the very moment
> route (or destination) changes, we might have old packets queued in tx
> queues, of the old ethernet device (eth0 : multi queue compatable)

> 2) Application does a sendmsg() or connect() call and sk->sk_dst_cache
> is rebuild, it points to a dst_entry referring a new device (eth1 : non
> multiqueue)
>
> 3) When one old packet finally is transmitted, we do :
>
>    queue_index = 1; // any value > 0
>
>    if (sk && sk->sk_dst_cache)
>       sk_tx_queue_set(sk, queue_index); // remember a >0 value
>
> 4) application does a sendmsg(), enqueues a new skb on eth1
>
> 5) We re-enter dev_pick_tx(), and consider cached value in 3) is valid.
>    we pick a non existent txq for eth1 device.
>
> 6) We crash.
>
> > The following might be better to prove the panic is due to this, since
> > your suggestion will hide a panic that happens somewhat rare (according
> > to Denys):
> >
> >       if (sk_tx_queue_recorded(sk)) {
> >             queue_index = sk_tx_queue_get(sk);
> > +           queue_index = dev_cap_txqueue(dev, queue_index);
> >       } else {
> >
>
> Sure, but I thought I was clear enough to prove this commit was wrong,
> and we have to find a fix.

If the dst got changed between call to vlan_dev_hwaccel_hard_start_xmit
and it's call to dev_queue_xmit, that change to dst should have reset
sk_tx_queue_mapping to -1 by calling sk_tx_queue_clear (assuming that I
have changed in all paths, eg __sk_dst_reset), and thus result in a new
mapping in dev_pick_tx. Would the patch hide the actual bug where we do
not clear sk_tx_queue_mapping, eg __sk_dst_set does it? I agree the
patch will fix the panic, but this check could be removed if the code
which changes the dst is fixed to clear the mapping. I could check that
if you think this assumption is correct.

Thanks,

- KK


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-12  7:54       ` NULL pointer dereference panic in stable (2.6.33.2), amd64 Krishna Kumar2
@ 2010-04-12  9:31         ` Eric Dumazet
  2010-04-12 16:11           ` Denys Fedorysychenko
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-12  9:31 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: David Miller, netdev, Denys Fedorysychenko

Le lundi 12 avril 2010 à 13:24 +0530, Krishna Kumar2 a écrit :

> If the dst got changed between call to vlan_dev_hwaccel_hard_start_xmit
> and it's call to dev_queue_xmit, that change to dst should have reset
> sk_tx_queue_mapping to -1 by calling sk_tx_queue_clear (assuming that I
> have changed in all paths, eg __sk_dst_reset), and thus result in a new
> mapping in dev_pick_tx. Would the patch hide the actual bug where we do
> not clear sk_tx_queue_mapping, eg __sk_dst_set does it? I agree the
> patch will fix the panic, but this check could be removed if the code
> which changes the dst is fixed to clear the mapping. I could check that
> if you think this assumption is correct.
> 

I believe you focus on another problem. I am not saying we dont have
another bug (forgetting to reset sk_dst_cache somewhere).

I am only saying that when we want to cache the queue number on a given
socket, we have to make sure current packet routing decision was taken
on same dst_entries than current and future ones. Denys hit the problem
because of long delays caused by traffic shaping.

So the cache renew must be safe, which I tried to fix.

You are saying that cache invalidation might be missing from some paths.
I dont think so because I took an extensive look at these spots when
working on yet another RCU conversion two days ago (sk_dst_lock becomes
a spinlock). This was fresh in my mind, this is why I probably found
Denys problem origin so quickly ;)




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-12  9:31         ` Eric Dumazet
@ 2010-04-12 16:11           ` Denys Fedorysychenko
  2010-04-12 20:09             ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: Denys Fedorysychenko @ 2010-04-12 16:11 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Krishna Kumar2, David Miller, netdev

[-- Attachment #1: Type: Text/Plain, Size: 163 bytes --]

On Monday 12 April 2010 12:31:43 Eric Dumazet wrote:
.
Seems problem still remain. Patched kernel, but paniced now.
Btw, i dont have any multiqueue card, i think.

[-- Attachment #2: x.txt --]
[-- Type: text/plain, Size: 12174 bytes --]

Apr 12 18:46:58 80.83.17.1 dropbear[4843]: exit before auth: Disconnect received
Apr 12 18:46:59 80.83.17.1 dropbear[4845]: Child connection from 82.113.44.186:48692
Apr 12 18:46:59 80.83.17.1 dropbear[4844]: exit before auth: Disconnect received
Apr 12 18:46:59 80.83.17.1 kernel: [12598.956375] BUG: unable to handle kernel NULL pointer dereference at (null)
Apr 12 18:46:59 80.83.17.1 kernel: [12598.956571] IP: [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.956762] PGD 21debc067 PUD 21c881067 PMD 0 
Apr 12 18:46:59 80.83.17.1 kernel: [12598.956947] Oops: 0000 [#1] SMP 
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957126] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957311] CPU 0 
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Pid: 0, comm: swapper Not tainted 2.6.33.2-build-0052test-64 #2         /        
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RIP: 0010:[<ffffffff811e587f>]  [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RSP: 0000:ffff880028203a30  EFLAGS: 00010202
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff880209d8a900
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RDX: ffff88021d870000 RSI: 0000000000000000 RDI: ffff88020a7b48e8
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RBP: ffff880028203a60 R08: ffff88021c8be89c R09: ffff88021c8bec00
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] R10: dead000000200200 R11: dead000000100100 R12: ffff88021f98a880
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] R13: ffff88021d5c0900 R14: ffff88020a7b48e8 R15: ffff88021cbad000
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CR2: 0000000000000000 CR3: 000000021c9d8000 CR4: 00000000000006f0
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Process swapper (pid: 0, threadinfo ffffffff81392000, task ffffffff813a1020)
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Stack:
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  ffff88021d870000 ffff88021d5c0900 0000000000000042 ffff88021d5c0900
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <0> ffff88021cbad000 ffff88021cbad000 ffff880028203a80 ffffffffa01c12a9
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <0> 0000000000000000 ffff88020a7b48e8 ffff880028203ad0 ffffffff811e540e
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Call Trace:
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  <IRQ> 
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811f648a>] sch_direct_xmit+0x5d/0x16b
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811f6654>] __qdisc_run+0xbc/0xdc
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e5939>] dev_queue_xmit+0x346/0x46d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8120a384>] ip_finish_output2+0x1c2/0x206
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8120a430>] ip_finish_output+0x68/0x6a
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8120a4d2>] ip_output+0xa0/0xa5
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81206d2e>] ip_forward_finish+0x2e/0x32
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81206ff4>] ip_forward+0x2c2/0x322
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81205ae0>] ip_rcv_finish+0x2f0/0x30a
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81205d77>] ip_rcv+0x27d/0x2a4
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8124ad48>] ? vlan_hwaccel_do_receive+0x2b/0xda
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e47b6>] netif_receive_skb+0x450/0x475
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e4909>] napi_skb_finish+0x24/0x3b
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8124b01b>] vlan_gro_receive+0x7c/0x81
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa015d6c5>] e1000_receive_skb+0x4a/0x65 [e1000e]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa015d8cb>] e1000_clean_rx_irq+0x1eb/0x29c [e1000e]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa015ebfb>] e1000_clean+0x75/0x22e [e1000e]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa0234d6c>] ? hfsc_dequeue+0x171/0x2a6 [sch_hfsc]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e4e56>] net_rx_action+0xa7/0x17a
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81005543>] do_softirq+0x33/0x68
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81039407>] irq_exit+0x36/0x75
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81004c3e>] do_IRQ+0xaa/0xc1
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8125ba93>] ret_from_intr+0x0/0xa
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  <EOI> 
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8100a0c7>] ? mwait_idle+0x66/0x6b
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81001d24>] ? enter_idle+0x20/0x22
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81001d7b>] cpu_idle+0x55/0x8d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8124bba5>] rest_init+0x79/0x7b
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff813fca70>] start_kernel+0x362/0x36d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff813fc0a8>] x86_64_start_reservations+0xa5/0xa9
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff813fc189>] x86_64_start_kernel+0xdd/0xe4
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Code: e2 48 8b 55 d0 49 c1 e4 07 66 41 8b 86 a6 00 00 00 4c 03 a2 00 03 00 00 80 e4 cf 80 cc 20 49 8b 5c 24 08 66 41 89 86 a6 00 00 00 <48> 83 3b 00 0f 84 bb 00 00 00 4c 8d ab 9c 00 00 00 4c 89 ef e8 
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RIP  [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  RSP <ffff880028203a30>
Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CR2: 0000000000000000
Apr 12 18:46:59 80.83.17.1 kernel: [12598.974856] ---[ end trace 739e5480c8ab784f ]---
Apr 12 18:46:59 80.83.17.1 kernel: [12598.975082] Kernel panic - not syncing: Fatal exception in interrupt
Apr 12 18:46:59 80.83.17.1 kernel: [12598.975311] Pid: 0, comm: swapper Tainted: G      D    2.6.33.2-build-0052test-64 #2
Apr 12 18:46:59 80.83.17.1 kernel: [12598.975706] Call Trace:
Apr 12 18:46:59 80.83.17.1 kernel: [12598.975920]  <IRQ>  [<ffffffff81259753>] panic+0xa0/0x161
Apr 12 18:46:59 80.83.17.1 kernel: [12598.976200]  [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20
Apr 12 18:46:59 80.83.17.1 kernel: [12598.976431]  [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c
Apr 12 18:46:59 80.83.17.1 kernel: [12598.976657]  [<ffffffff81006651>] oops_end+0xaa/0xba
Apr 12 18:46:59 80.83.17.1 kernel: [12598.976882]  [<ffffffff8101e653>] no_context+0x1f3/0x202
Apr 12 18:46:59 80.83.17.1 kernel: [12598.977113]  [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0
Apr 12 18:46:59 80.83.17.1 kernel: [12598.977347]  [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5
Apr 12 18:46:59 80.83.17.1 kernel: [12598.977577]  [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.977806]  [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25
Apr 12 18:46:59 80.83.17.1 kernel: [12598.978045]  [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.978282]  [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.978513]  [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10
Apr 12 18:46:59 80.83.17.1 kernel: [12598.978741]  [<ffffffff8101eb32>] do_page_fault+0x114/0x24a
Apr 12 18:46:59 80.83.17.1 kernel: [12598.978967]  [<ffffffff8125bc9f>] page_fault+0x1f/0x30
Apr 12 18:46:59 80.83.17.1 kernel: [12598.979196]  [<ffffffff811e587f>] ? dev_queue_xmit+0x28c/0x46d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.979426]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.979821]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
Apr 12 18:46:59 80.83.17.1 kernel: [12598.980055]  [<ffffffff811f648a>] sch_direct_xmit+0x5d/0x16b
Apr 12 18:46:59 80.83.17.1 kernel: [12598.980284]  [<ffffffff811f6654>] __qdisc_run+0xbc/0xdc
Apr 12 18:46:59 80.83.17.1 kernel: [12598.980514]  [<ffffffff811e5939>] dev_queue_xmit+0x346/0x46d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.980740]  [<ffffffff8120a384>] ip_finish_output2+0x1c2/0x206
Apr 12 18:46:59 80.83.17.1 kernel: [12598.980966]  [<ffffffff8120a430>] ip_finish_output+0x68/0x6a
Apr 12 18:46:59 80.83.17.1 kernel: [12598.981197]  [<ffffffff8120a4d2>] ip_output+0xa0/0xa5
Apr 12 18:46:59 80.83.17.1 kernel: [12598.981427]  [<ffffffff81206d2e>] ip_forward_finish+0x2e/0x32
Apr 12 18:46:59 80.83.17.1 kernel: [12598.981654]  [<ffffffff81206ff4>] ip_forward+0x2c2/0x322
Apr 12 18:46:59 80.83.17.1 kernel: [12598.981880]  [<ffffffff81205ae0>] ip_rcv_finish+0x2f0/0x30a
Apr 12 18:46:59 80.83.17.1 kernel: [12598.982111]  [<ffffffff81205d77>] ip_rcv+0x27d/0x2a4
Apr 12 18:46:59 80.83.17.1 kernel: [12598.982337]  [<ffffffff8124ad48>] ? vlan_hwaccel_do_receive+0x2b/0xda
Apr 12 18:46:59 80.83.17.1 kernel: [12598.982566]  [<ffffffff811e47b6>] netif_receive_skb+0x450/0x475
Apr 12 18:46:59 80.83.17.1 kernel: [12598.982793]  [<ffffffff811e4909>] napi_skb_finish+0x24/0x3b
Apr 12 18:46:59 80.83.17.1 kernel: [12598.983025]  [<ffffffff8124b01b>] vlan_gro_receive+0x7c/0x81
Apr 12 18:46:59 80.83.17.1 kernel: [12598.983260]  [<ffffffffa015d6c5>] e1000_receive_skb+0x4a/0x65 [e1000e]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.983492]  [<ffffffffa015d8cb>] e1000_clean_rx_irq+0x1eb/0x29c [e1000e]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.983727]  [<ffffffffa015ebfb>] e1000_clean+0x75/0x22e [e1000e]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.983955]  [<ffffffffa0234d6c>] ? hfsc_dequeue+0x171/0x2a6 [sch_hfsc]
Apr 12 18:46:59 80.83.17.1 kernel: [12598.984190]  [<ffffffff811e4e56>] net_rx_action+0xa7/0x17a
Apr 12 18:46:59 80.83.17.1 kernel: [12598.984416]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
Apr 12 18:46:59 80.83.17.1 kernel: [12598.984642]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
Apr 12 18:46:59 80.83.17.1 kernel: [12598.984866]  [<ffffffff81005543>] do_softirq+0x33/0x68
Apr 12 18:46:59 80.83.17.1 kernel: [12598.985097]  [<ffffffff81039407>] irq_exit+0x36/0x75
Apr 12 18:46:59 80.83.17.1 kernel: [12598.985323]  [<ffffffff81004c3e>] do_IRQ+0xaa/0xc1
Apr 12 18:46:59 80.83.17.1 kernel: [12598.985546]  [<ffffffff8125ba93>] ret_from_intr+0x0/0xa
Apr 12 18:46:59 80.83.17.1 kernel: [12598.985770]  <EOI>  [<ffffffff8100a0c7>] ? mwait_idle+0x66/0x6b
Apr 12 18:46:59 80.83.17.1 kernel: [12598.986048]  [<ffffffff81001d24>] ? enter_idle+0x20/0x22
Apr 12 18:46:59 80.83.17.1 kernel: [12598.986284]  [<ffffffff81001d7b>] cpu_idle+0x55/0x8d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.986507]  [<ffffffff8124bba5>] rest_init+0x79/0x7b
Apr 12 18:46:59 80.83.17.1 kernel: [12598.986730]  [<ffffffff813fca70>] start_kernel+0x362/0x36d
Apr 12 18:46:59 80.83.17.1 kernel: [12598.986955]  [<ffffffff813fc0a8>] x86_64_start_reservations+0xa5/0xa9
Apr 12 18:46:59 80.83.17.1 kernel: [12598.987189]  [<ffffffff813fc189>] x86_64_start_kernel+0xdd/0xe4

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-12 16:11           ` Denys Fedorysychenko
@ 2010-04-12 20:09             ` Eric Dumazet
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Dumazet @ 2010-04-12 20:09 UTC (permalink / raw)
  To: Denys Fedorysychenko; +Cc: Krishna Kumar2, David Miller, netdev

Le lundi 12 avril 2010 à 19:11 +0300, Denys Fedorysychenko a écrit :
> On Monday 12 April 2010 12:31:43 Eric Dumazet wrote:
> .
> Seems problem still remain. Patched kernel, but paniced now.
> Btw, i dont have any multiqueue card, i think.
> pièce jointe document texte brut (x.txt)
> Apr 12 18:46:58 80.83.17.1 dropbear[4843]: exit before auth: Disconnect received
> Apr 12 18:46:59 80.83.17.1 dropbear[4845]: Child connection from 82.113.44.186:48692
> Apr 12 18:46:59 80.83.17.1 dropbear[4844]: exit before auth: Disconnect received
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.956375] BUG: unable to handle kernel NULL pointer dereference at (null)
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.956571] IP: [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.956762] PGD 21debc067 PUD 21c881067 PMD 0 
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.956947] Oops: 0000 [#1] SMP 
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957126] last sysfs file: /sys/devices/virtual/vc/vcs3/dev
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957311] CPU 0 
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Pid: 0, comm: swapper Not tainted 2.6.33.2-build-0052test-64 #2         /        
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RIP: 0010:[<ffffffff811e587f>]  [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RSP: 0000:ffff880028203a30  EFLAGS: 00010202
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RAX: 0000000000002000 RBX: 0000000000000000 RCX: ffff880209d8a900
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RDX: ffff88021d870000 RSI: 0000000000000000 RDI: ffff88020a7b48e8
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RBP: ffff880028203a60 R08: ffff88021c8be89c R09: ffff88021c8bec00
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] R10: dead000000200200 R11: dead000000100100 R12: ffff88021f98a880
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] R13: ffff88021d5c0900 R14: ffff88020a7b48e8 R15: ffff88021cbad000
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CR2: 0000000000000000 CR3: 000000021c9d8000 CR4: 00000000000006f0
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Process swapper (pid: 0, threadinfo ffffffff81392000, task ffffffff813a1020)
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Stack:
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  ffff88021d870000 ffff88021d5c0900 0000000000000042 ffff88021d5c0900
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <0> ffff88021cbad000 ffff88021cbad000 ffff880028203a80 ffffffffa01c12a9
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] <0> 0000000000000000 ffff88020a7b48e8 ffff880028203ad0 ffffffff811e540e
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Call Trace:
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  <IRQ> 
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811f648a>] sch_direct_xmit+0x5d/0x16b
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811f6654>] __qdisc_run+0xbc/0xdc
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e5939>] dev_queue_xmit+0x346/0x46d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8120a384>] ip_finish_output2+0x1c2/0x206
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8120a430>] ip_finish_output+0x68/0x6a
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8120a4d2>] ip_output+0xa0/0xa5
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81206d2e>] ip_forward_finish+0x2e/0x32
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81206ff4>] ip_forward+0x2c2/0x322
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81205ae0>] ip_rcv_finish+0x2f0/0x30a
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81205d77>] ip_rcv+0x27d/0x2a4
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8124ad48>] ? vlan_hwaccel_do_receive+0x2b/0xda
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e47b6>] netif_receive_skb+0x450/0x475
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e4909>] napi_skb_finish+0x24/0x3b
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8124b01b>] vlan_gro_receive+0x7c/0x81
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa015d6c5>] e1000_receive_skb+0x4a/0x65 [e1000e]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa015d8cb>] e1000_clean_rx_irq+0x1eb/0x29c [e1000e]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa015ebfb>] e1000_clean+0x75/0x22e [e1000e]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffffa0234d6c>] ? hfsc_dequeue+0x171/0x2a6 [sch_hfsc]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff811e4e56>] net_rx_action+0xa7/0x17a
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81005543>] do_softirq+0x33/0x68
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81039407>] irq_exit+0x36/0x75
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81004c3e>] do_IRQ+0xaa/0xc1
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8125ba93>] ret_from_intr+0x0/0xa
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  <EOI> 
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8100a0c7>] ? mwait_idle+0x66/0x6b
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81001d24>] ? enter_idle+0x20/0x22
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff81001d7b>] cpu_idle+0x55/0x8d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff8124bba5>] rest_init+0x79/0x7b
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff813fca70>] start_kernel+0x362/0x36d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff813fc0a8>] x86_64_start_reservations+0xa5/0xa9
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  [<ffffffff813fc189>] x86_64_start_kernel+0xdd/0xe4
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] Code: e2 48 8b 55 d0 49 c1 e4 07 66 41 8b 86 a6 00 00 00 4c 03 a2 00 03 00 00 80 e4 cf 80 cc 20 49 8b 5c 24 08 66 41 89 86 a6 00 00 00 <48> 83 3b 00 0f 84 bb 00 00 00 4c 8d ab 9c 00 00 00 4c 89 ef e8 
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] RIP  [<ffffffff811e587f>] dev_queue_xmit+0x28c/0x46d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342]  RSP <ffff880028203a30>
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.957342] CR2: 0000000000000000
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.974856] ---[ end trace 739e5480c8ab784f ]---
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.975082] Kernel panic - not syncing: Fatal exception in interrupt
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.975311] Pid: 0, comm: swapper Tainted: G      D    2.6.33.2-build-0052test-64 #2
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.975706] Call Trace:
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.975920]  <IRQ>  [<ffffffff81259753>] panic+0xa0/0x161
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.976200]  [<ffffffff81003293>] ? apic_timer_interrupt+0x13/0x20
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.976431]  [<ffffffff81035673>] ? kmsg_dump+0x112/0x12c
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.976657]  [<ffffffff81006651>] oops_end+0xaa/0xba
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.976882]  [<ffffffff8101e653>] no_context+0x1f3/0x202
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.977113]  [<ffffffff8101e81c>] __bad_area_nosemaphore+0x1ba/0x1e0
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.977347]  [<ffffffff8113f8b3>] ? swiotlb_map_page+0x0/0xd5
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.977577]  [<ffffffffa015c55a>] ? pci_map_single+0x8a/0x99 [e1000e]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.977806]  [<ffffffff8113f0c0>] ? swiotlb_dma_mapping_error+0x18/0x25
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.978045]  [<ffffffffa015a2e0>] ? pci_dma_mapping_error+0x31/0x3d [e1000e]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.978282]  [<ffffffffa015cc37>] ? e1000_xmit_frame+0x6ce/0xa43 [e1000e]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.978513]  [<ffffffff8101e850>] bad_area_nosemaphore+0xe/0x10
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.978741]  [<ffffffff8101eb32>] do_page_fault+0x114/0x24a
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.978967]  [<ffffffff8125bc9f>] page_fault+0x1f/0x30
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.979196]  [<ffffffff811e587f>] ? dev_queue_xmit+0x28c/0x46d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.979426]  [<ffffffffa01c12a9>] vlan_dev_hwaccel_hard_start_xmit+0x68/0x86 [8021q]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.979821]  [<ffffffff811e540e>] dev_hard_start_xmit+0x232/0x304
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.980055]  [<ffffffff811f648a>] sch_direct_xmit+0x5d/0x16b
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.980284]  [<ffffffff811f6654>] __qdisc_run+0xbc/0xdc
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.980514]  [<ffffffff811e5939>] dev_queue_xmit+0x346/0x46d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.980740]  [<ffffffff8120a384>] ip_finish_output2+0x1c2/0x206
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.980966]  [<ffffffff8120a430>] ip_finish_output+0x68/0x6a
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.981197]  [<ffffffff8120a4d2>] ip_output+0xa0/0xa5
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.981427]  [<ffffffff81206d2e>] ip_forward_finish+0x2e/0x32
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.981654]  [<ffffffff81206ff4>] ip_forward+0x2c2/0x322
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.981880]  [<ffffffff81205ae0>] ip_rcv_finish+0x2f0/0x30a
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.982111]  [<ffffffff81205d77>] ip_rcv+0x27d/0x2a4
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.982337]  [<ffffffff8124ad48>] ? vlan_hwaccel_do_receive+0x2b/0xda
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.982566]  [<ffffffff811e47b6>] netif_receive_skb+0x450/0x475
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.982793]  [<ffffffff811e4909>] napi_skb_finish+0x24/0x3b
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.983025]  [<ffffffff8124b01b>] vlan_gro_receive+0x7c/0x81
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.983260]  [<ffffffffa015d6c5>] e1000_receive_skb+0x4a/0x65 [e1000e]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.983492]  [<ffffffffa015d8cb>] e1000_clean_rx_irq+0x1eb/0x29c [e1000e]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.983727]  [<ffffffffa015ebfb>] e1000_clean+0x75/0x22e [e1000e]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.983955]  [<ffffffffa0234d6c>] ? hfsc_dequeue+0x171/0x2a6 [sch_hfsc]
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.984190]  [<ffffffff811e4e56>] net_rx_action+0xa7/0x17a
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.984416]  [<ffffffff81039670>] __do_softirq+0x96/0x11a
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.984642]  [<ffffffff810037cc>] call_softirq+0x1c/0x28
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.984866]  [<ffffffff81005543>] do_softirq+0x33/0x68
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.985097]  [<ffffffff81039407>] irq_exit+0x36/0x75
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.985323]  [<ffffffff81004c3e>] do_IRQ+0xaa/0xc1
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.985546]  [<ffffffff8125ba93>] ret_from_intr+0x0/0xa
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.985770]  <EOI>  [<ffffffff8100a0c7>] ? mwait_idle+0x66/0x6b
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.986048]  [<ffffffff81001d24>] ? enter_idle+0x20/0x22
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.986284]  [<ffffffff81001d7b>] cpu_idle+0x55/0x8d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.986507]  [<ffffffff8124bba5>] rest_init+0x79/0x7b
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.986730]  [<ffffffff813fca70>] start_kernel+0x362/0x36d
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.986955]  [<ffffffff813fc0a8>] x86_64_start_reservations+0xa5/0xa9
> Apr 12 18:46:59 80.83.17.1 kernel: [12598.987189]  [<ffffffff813fc189>] x86_64_start_kernel+0xdd/0xe4

This is becoming tricky :(

This is forwarding case, no socket involved in this case.

If no multiqueue is involved, I dont see how it can happen.

We should take a look at requeues (qdisc congestion), there might be a
problem with them.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-12  7:18       ` Eric Dumazet
  2010-04-12  7:36         ` David Miller
@ 2010-04-15  6:52         ` David Miller
  2010-04-15  8:02           ` Eric Dumazet
  1 sibling, 1 reply; 37+ messages in thread
From: David Miller @ 2010-04-15  6:52 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 12 Apr 2010 09:18:17 +0200

> [PATCH] net: dev_pick_tx() fix
> 
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> 
> Bug introduced by commit a4ee3ce3
> (net: Use sk_tx_queue_mapping for connected sockets)
> 
> Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

It looks like Denys is still getting crashes even with this patch
applied.  And I also think there is some meric to some of Krishna's
analysis.

To me it seems to make more sense to validate the SKB's queue against
the real actual choosen device's range.

The socket queue index will catch up and eventually become valid
because the dst reset will invalidate the queue setting, and we'll
thus recompute it as needed, as Krishna stated.

So I'm tossing this patch for now since it doesn't even aparently
fix the bug.


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15  6:52         ` David Miller
@ 2010-04-15  8:02           ` Eric Dumazet
  2010-04-15  8:26             ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-15  8:02 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat

Le mercredi 14 avril 2010 à 23:52 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 12 Apr 2010 09:18:17 +0200
> 
> > [PATCH] net: dev_pick_tx() fix
> > 
> > When dev_pick_tx() caches tx queue_index on a socket, we must check
> > socket dst_entry matches skb one, or risk a crash later, as reported by
> > Denys Fedorysychenko, if old packets are in flight during a route
> > change, involving devices with different number of queues.
> > 
> > Bug introduced by commit a4ee3ce3
> > (net: Use sk_tx_queue_mapping for connected sockets)
> > 
> > Reported-by: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> It looks like Denys is still getting crashes even with this patch
> applied.  And I also think there is some meric to some of Krishna's
> analysis.
> 


> To me it seems to make more sense to validate the SKB's queue against
> the real actual choosen device's range.
> 
> The socket queue index will catch up and eventually become valid
> because the dst reset will invalidate the queue setting, and we'll
> thus recompute it as needed, as Krishna stated.
> 

??? 


> So I'm tossing this patch for now since it doesn't even aparently
> fix the bug.

I am a bit lost here David.

Denys got a crash that we cannot explain yet. He said he has no
multiqueue devices, so obviously my patch cant help him.

But this patch was fixing a real issue, I believe I pointed it twice
already...

I'll try to setup an environment to trigger this bug for real, but this
will take time, my dev machines are not multiqueue.




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15  8:02           ` Eric Dumazet
@ 2010-04-15  8:26             ` David Miller
  2010-04-15  8:51               ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2010-04-15  8:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Apr 2010 10:02:33 +0200

> Denys got a crash that we cannot explain yet. He said he has no
> multiqueue devices, so obviously my patch cant help him.
> 
> But this patch was fixing a real issue, I believe I pointed it twice
> already...

Ok, I thought your patch was specifically meant to fix Denys's bug.

The confusion comes from the fact that you mention Denys's crash in
your commit message:

--------------------
When dev_pick_tx() caches tx queue_index on a socket, we must check
socket dst_entry matches skb one, or risk a crash later, as reported by
Denys Fedorysychenko, if old packets are in flight during a route
change, involving devices with different number of queues.
--------------------

Anyways, I studied your patch once more and read the thread discussion
with Krishna again and your patch looks fine.  I'll apply it to
net-2.6, thanks!


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15  8:26             ` David Miller
@ 2010-04-15  8:51               ` Eric Dumazet
  2010-04-15  9:06                 ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-15  8:51 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat

Le jeudi 15 avril 2010 à 01:26 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 15 Apr 2010 10:02:33 +0200
> 
> > Denys got a crash that we cannot explain yet. He said he has no
> > multiqueue devices, so obviously my patch cant help him.
> > 
> > But this patch was fixing a real issue, I believe I pointed it twice
> > already...
> 
> Ok, I thought your patch was specifically meant to fix Denys's bug.
> 
> The confusion comes from the fact that you mention Denys's crash in
> your commit message:
> 
> --------------------
> When dev_pick_tx() caches tx queue_index on a socket, we must check
> socket dst_entry matches skb one, or risk a crash later, as reported by
> Denys Fedorysychenko, if old packets are in flight during a route
> change, involving devices with different number of queues.
> --------------------
> 
> Anyways, I studied your patch once more and read the thread discussion
> with Krishna again and your patch looks fine.  I'll apply it to
> net-2.6, thanks!
> 

In any case, I think there is a fundamental problem with this sk
caching. Because one packet can travel in many stacked devices before
hitting the wire.

(bonding, vlan, ethernet) for example.

Socket cache is meaningfull for one level only...




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15  8:51               ` Eric Dumazet
@ 2010-04-15  9:06                 ` David Miller
  2010-04-15  9:11                   ` Denys Fedorysychenko
  2010-04-15 20:30                   ` Eric Dumazet
  0 siblings, 2 replies; 37+ messages in thread
From: David Miller @ 2010-04-15  9:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Apr 2010 10:51:47 +0200

> In any case, I think there is a fundamental problem with this sk
> caching. Because one packet can travel in many stacked devices before
> hitting the wire.
> 
> (bonding, vlan, ethernet) for example.
> 
> Socket cache is meaningfull for one level only...

We were talking the other day about that 'tun' change to orphan the
SKB on TX, and I mentioned the possibility of just doing this in some
generic location before we give the packet to the device ->xmit()
method.

Such a scheme could help with this problem too.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15  9:06                 ` David Miller
@ 2010-04-15  9:11                   ` Denys Fedorysychenko
  2010-04-15 10:37                     ` Eric Dumazet
  2010-04-15 20:30                   ` Eric Dumazet
  1 sibling, 1 reply; 37+ messages in thread
From: Denys Fedorysychenko @ 2010-04-15  9:11 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, krkumar2, netdev

Btw i have application using tun.

On Thursday 15 April 2010 12:06:19 David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 15 Apr 2010 10:51:47 +0200
> 
> > In any case, I think there is a fundamental problem with this sk
> > caching. Because one packet can travel in many stacked devices before
> > hitting the wire.
> >
> > (bonding, vlan, ethernet) for example.
> >
> > Socket cache is meaningfull for one level only...
> 
> We were talking the other day about that 'tun' change to orphan the
> SKB on TX, and I mentioned the possibility of just doing this in some
> generic location before we give the packet to the device ->xmit()
> method.
> 
> Such a scheme could help with this problem too.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15  9:11                   ` Denys Fedorysychenko
@ 2010-04-15 10:37                     ` Eric Dumazet
  2010-04-29 10:50                       ` Denys Fedorysychenko
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-15 10:37 UTC (permalink / raw)
  To: Denys Fedorysychenko; +Cc: David Miller, krkumar2, netdev

Le jeudi 15 avril 2010 à 12:11 +0300, Denys Fedorysychenko a écrit :
> Btw i have application using tun.

Could you add following sanity test to catch the error ?

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fa8b476..b67274a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -988,6 +988,7 @@ static inline
 struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev,
 					 unsigned int index)
 {
+	WARN_ON(index >= dev->num_tx_queues);
 	return &dev->_tx[index];
 }
 



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15  9:06                 ` David Miller
  2010-04-15  9:11                   ` Denys Fedorysychenko
@ 2010-04-15 20:30                   ` Eric Dumazet
  2010-04-15 20:46                     ` Eric Dumazet
  1 sibling, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-15 20:30 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat

Le jeudi 15 avril 2010 à 02:06 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 15 Apr 2010 10:51:47 +0200
> 
> > In any case, I think there is a fundamental problem with this sk
> > caching. Because one packet can travel in many stacked devices before
> > hitting the wire.
> > 
> > (bonding, vlan, ethernet) for example.
> > 
> > Socket cache is meaningfull for one level only...
> 
> We were talking the other day about that 'tun' change to orphan the
> SKB on TX, and I mentioned the possibility of just doing this in some
> generic location before we give the packet to the device ->xmit()
> method.
> 
> Such a scheme could help with this problem too.

Same thing we did with 

if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
	skb_dst_drop(skb);

in dev_hard_start_xmit() ?

Problem is this skb_tstamp_tx() thing....

One possibility would be to change skb_orphan() to let everything done
by destructor. 

One more argument would let destructor() know if this is the final
destructor() called from skb_release_head_state()

destructor() would be responsible to set skb->destructor and/or skb->sk
to NULL when possible. 


All normal destructors would not care of this 2nd argument and just do
what they actually do, plus setting skb->sk = NULL, skb->destructor =
NULL

Fast path would stay as today, no extra test.

Only tstamp users would need to setup another destructor, a bit more
complex (it would have to take a look at 2nd argument before really
doing the job)


Completely untested patch to get the idea :
(to be completed for the tstamp thing)


 include/linux/skbuff.h         |    8 +++-----
 include/net/sctp/sctp.h        |    2 +-
 include/net/sock.h             |    4 ++--
 net/caif/caif_socket.c         |    4 +++-
 net/core/dev.c                 |   26 +++++++++-----------------
 net/core/skbuff.c              |    2 +-
 net/core/sock.c                |    8 ++++++--
 net/l2tp/l2tp_core.c           |    4 +++-
 net/netfilter/nf_tproxy_core.c |    2 +-
 net/packet/af_packet.c         |    4 ++--
 net/unix/af_unix.c             |    4 ++--
 11 files changed, 33 insertions(+), 35 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 38501d2..7dfd833 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -353,7 +353,7 @@ struct sk_buff {
 	kmemcheck_bitfield_end(flags1);
 	__be16			protocol;
 
-	void			(*destructor)(struct sk_buff *skb);
+	void			(*destructor)(struct sk_buff *skb, int final);
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	struct nf_conntrack	*nfct;
 	struct sk_buff		*nfct_reasm;
@@ -1407,15 +1407,13 @@ static inline void pskb_trim_unique(struct sk_buff *skb, unsigned int len)
  *	@skb: buffer to orphan
  *
  *	If a buffer currently has an owner then we call the owner's
- *	destructor function and make the @skb unowned. The buffer continues
+ *	destructor function. The buffer continues
  *	to exist but is no longer charged to its former owner.
  */
 static inline void skb_orphan(struct sk_buff *skb)
 {
 	if (skb->destructor)
-		skb->destructor(skb);
-	skb->destructor = NULL;
-	skb->sk		= NULL;
+		skb->destructor(skb, 0);
 }
 
 /**
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 5915155..49e2162 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -130,7 +130,7 @@ int sctp_inet_listen(struct socket *sock, int backlog);
 void sctp_write_space(struct sock *sk);
 unsigned int sctp_poll(struct file *file, struct socket *sock,
 		poll_table *wait);
-void sctp_sock_rfree(struct sk_buff *skb);
+void sctp_sock_rfree(struct sk_buff *skb, int final);
 void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 		    struct sctp_association *asoc);
 extern struct percpu_counter sctp_sockets_allocated;
diff --git a/include/net/sock.h b/include/net/sock.h
index 56df440..a042c9d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -988,8 +988,8 @@ extern struct sk_buff		*sock_wmalloc(struct sock *sk,
 extern struct sk_buff		*sock_rmalloc(struct sock *sk,
 					      unsigned long size, int force,
 					      gfp_t priority);
-extern void			sock_wfree(struct sk_buff *skb);
-extern void			sock_rfree(struct sk_buff *skb);
+extern void			sock_wfree(struct sk_buff *skb, int final);
+extern void			sock_rfree(struct sk_buff *skb, int final);
 
 extern int			sock_setsockopt(struct socket *sock, int level,
 						int op, char __user *optval,
diff --git a/net/caif/caif_socket.c b/net/caif/caif_socket.c
index cdf62b9..4e7276a 100644
--- a/net/caif/caif_socket.c
+++ b/net/caif/caif_socket.c
@@ -256,10 +256,12 @@ static void caif_sktflowctrl_cb(struct cflayer *layr,
 	}
 }
 
-static void skb_destructor(struct sk_buff *skb)
+static void skb_destructor(struct sk_buff *skb, int final)
 {
 	dbfs_atomic_inc(&cnt.skb_free);
 	dbfs_atomic_dec(&cnt.skb_in_use);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 876b111..9bffbe5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1828,12 +1828,12 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
 }
 
 struct dev_gso_cb {
-	void (*destructor)(struct sk_buff *skb);
+	void (*destructor)(struct sk_buff *skb, int final);
 };
 
 #define DEV_GSO_CB(skb) ((struct dev_gso_cb *)(skb)->cb)
 
-static void dev_gso_skb_destructor(struct sk_buff *skb)
+static void dev_gso_skb_destructor(struct sk_buff *skb, int final)
 {
 	struct dev_gso_cb *cb;
 
@@ -1847,7 +1847,9 @@ static void dev_gso_skb_destructor(struct sk_buff *skb)
 
 	cb = DEV_GSO_CB(skb);
 	if (cb->destructor)
-		cb->destructor(skb);
+		cb->destructor(skb, final);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 
 /**
@@ -1904,23 +1906,11 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(skb);
 
+		skb_orphan(skb);
+
 		rc = ops->ndo_start_xmit(skb, dev);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
-		/*
-		 * TODO: if skb_orphan() was called by
-		 * dev->hard_start_xmit() (for example, the unmodified
-		 * igb driver does that; bnx2 doesn't), then
-		 * skb_tx_software_timestamp() will be unable to send
-		 * back the time stamp.
-		 *
-		 * How can this be prevented? Always create another
-		 * reference to the socket before calling
-		 * dev->hard_start_xmit()? Prevent that skb_orphan()
-		 * does anything in dev->hard_start_xmit() by clearing
-		 * the skb destructor before the call and restoring it
-		 * afterwards, then doing the skb_orphan() ourselves?
-		 */
 		return rc;
 	}
 
@@ -1938,6 +1928,8 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
+		skb_orphan(nskb);
+
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index bdea0ef..90c171f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -396,7 +396,7 @@ static void skb_release_head_state(struct sk_buff *skb)
 #endif
 	if (skb->destructor) {
 		WARN_ON(in_irq());
-		skb->destructor(skb);
+		skb->destructor(skb, 1);
 	}
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	nf_conntrack_put(skb->nfct);
diff --git a/net/core/sock.c b/net/core/sock.c
index 7effa1e..fcf67ea 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1259,7 +1259,7 @@ void __init sk_init(void)
 /*
  * Write buffer destructor automatically called from kfree_skb.
  */
-void sock_wfree(struct sk_buff *skb)
+void sock_wfree(struct sk_buff *skb, int final)
 {
 	struct sock *sk = skb->sk;
 	unsigned int len = skb->truesize;
@@ -1279,18 +1279,22 @@ void sock_wfree(struct sk_buff *skb)
 	 */
 	if (atomic_sub_and_test(len, &sk->sk_wmem_alloc))
 		__sk_free(sk);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 EXPORT_SYMBOL(sock_wfree);
 
 /*
  * Read buffer destructor automatically called from kfree_skb.
  */
-void sock_rfree(struct sk_buff *skb)
+void sock_rfree(struct sk_buff *skb, int final)
 {
 	struct sock *sk = skb->sk;
 
 	atomic_sub(skb->truesize, &sk->sk_rmem_alloc);
 	sk_mem_uncharge(skb->sk, skb->truesize);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 EXPORT_SYMBOL(sock_rfree);
 
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 98dfcce..a3b0a95 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -973,9 +973,11 @@ EXPORT_SYMBOL_GPL(l2tp_xmit_core);
 
 /* Automatically called when the skb is freed.
  */
-static void l2tp_sock_wfree(struct sk_buff *skb)
+static void l2tp_sock_wfree(struct sk_buff *skb, int final)
 {
 	sock_put(skb->sk);
+	skb->sk = NULL;
+	skb->destructor = NULL;
 }
 
 /* For data skbs that we transmit, we associate with the tunnel socket
diff --git a/net/netfilter/nf_tproxy_core.c b/net/netfilter/nf_tproxy_core.c
index 5490fc3..17dd2d9 100644
--- a/net/netfilter/nf_tproxy_core.c
+++ b/net/netfilter/nf_tproxy_core.c
@@ -55,7 +55,7 @@ nf_tproxy_get_sock_v4(struct net *net, const u8 protocol,
 EXPORT_SYMBOL_GPL(nf_tproxy_get_sock_v4);
 
 static void
-nf_tproxy_destructor(struct sk_buff *skb)
+nf_tproxy_destructor(struct sk_buff *skb, int final)
 {
 	struct sock *sk = skb->sk;
 
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index f162d59..dc8e843 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -808,7 +808,7 @@ ring_is_full:
 	goto drop_n_restore;
 }
 
-static void tpacket_destruct_skb(struct sk_buff *skb)
+static void tpacket_destruct_skb(struct sk_buff *skb, int final)
 {
 	struct packet_sock *po = pkt_sk(skb->sk);
 	void *ph;
@@ -823,7 +823,7 @@ static void tpacket_destruct_skb(struct sk_buff *skb)
 		__packet_set_status(po, ph, TP_STATUS_AVAILABLE);
 	}
 
-	sock_wfree(skb);
+	sock_wfree(skb, final);
 }
 
 static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 3d9122e..17fca55 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1303,7 +1303,7 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb)
 		unix_notinflight(scm->fp->fp[i]);
 }
 
-static void unix_destruct_fds(struct sk_buff *skb)
+static void unix_destruct_fds(struct sk_buff *skb, int final)
 {
 	struct scm_cookie scm;
 	memset(&scm, 0, sizeof(scm));
@@ -1312,7 +1312,7 @@ static void unix_destruct_fds(struct sk_buff *skb)
 	/* Alas, it calls VFS */
 	/* So fscking what? fput() had been SMP-safe since the last Summer */
 	scm_destroy(&scm);
-	sock_wfree(skb);
+	sock_wfree(skb, final);
 }
 
 static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15 20:30                   ` Eric Dumazet
@ 2010-04-15 20:46                     ` Eric Dumazet
  2010-04-15 21:33                       ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-15 20:46 UTC (permalink / raw)
  To: David Miller; +Cc: krkumar2, netdev, nuclearcat

Le jeudi 15 avril 2010 à 22:30 +0200, Eric Dumazet a écrit :
>  
> @@ -1938,6 +1928,8 @@ gso:
>  		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>  			skb_dst_drop(nskb);
>  
> +		skb_orphan(nskb);
> +
>  		rc = ops->ndo_start_xmit(nskb, dev);
>  		if (unlikely(rc != NETDEV_TX_OK)) {
>  			if (rc & ~NETDEV_TX_MASK)

Well, might need to test if skb is not shared before orphaning it

	if (!skb_shared(skb))
		skb_orphan(nskb);




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15 20:46                     ` Eric Dumazet
@ 2010-04-15 21:33                       ` David Miller
  2010-04-16 22:18                         ` [PATCH net-next-2.6] net: Introduce skb_orphan_try() Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2010-04-15 21:33 UTC (permalink / raw)
  To: eric.dumazet; +Cc: krkumar2, netdev, nuclearcat

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 15 Apr 2010 22:46:15 +0200

> Le jeudi 15 avril 2010 à 22:30 +0200, Eric Dumazet a écrit :
>>  
>> @@ -1938,6 +1928,8 @@ gso:
>>  		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>>  			skb_dst_drop(nskb);
>>  
>> +		skb_orphan(nskb);
>> +
>>  		rc = ops->ndo_start_xmit(nskb, dev);
>>  		if (unlikely(rc != NETDEV_TX_OK)) {
>>  			if (rc & ~NETDEV_TX_MASK)
> 
> Well, might need to test if skb is not shared before orphaning it
> 
> 	if (!skb_shared(skb))
> 		skb_orphan(nskb);

If it's not legal to skb_orphan() here then it would not be legal for
the drivers to unconditionally skb_orphan(), which they do.

So either your test is unnecessary, or we have a big existing problem
:-)


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-15 21:33                       ` David Miller
@ 2010-04-16 22:18                         ` Eric Dumazet
  2010-04-18  9:46                           ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-16 22:18 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le jeudi 15 avril 2010 à 14:33 -0700, David Miller a écrit :

> If it's not legal to skb_orphan() here then it would not be legal for
> the drivers to unconditionally skb_orphan(), which they do.
> 
> So either your test is unnecessary, or we have a big existing problem
> :-)

I cooked following patch, introducing skb_orphan_try() helper, to
document all known exceptions.

I have a possible followup for this patch :

Orphaning skbs earlier could also make dev_kfree_skb_irq() faster.
Instead of queing skb into completion_queue and triggering
NET_TX_SOFTIRQ, we would directly free an orphaned skb ?



[PATCH net-next-2.6] net: Introduce skb_orphan_try()

Transmitted skb might be attached to a socket and a destructor, for
memory accounting purposes.

Traditionally, this destructor is called at tx completion time, when skb
is freed.

When tx completion is performed by another cpu than the sender, this
forces some cache lines to change ownership. XPS was an attempt to give
tx completion to initial cpu.

David idea is to call destructor right before giving skb to device (call
to ndo_start_xmit()). Because device queues are usually small, orphaning
skb before tx completion is not a big deal. Some drivers already do
this, we could do it in upper level.

There is one known exception to this early orphaning, called tx
timestamping. It needs to keep a reference to socket until device can
give a hardware or software timestamp.

This patch adds a skb_orphan_try() helper, to centralize all exceptions
to early orphaning in one spot, and use it in dev_hard_start_xmit().

"tbench 16" results on a Nehalem machine (2 X5570  @ 2.93GHz)
before: Throughput 4428.9 MB/sec 16 procs
after: Throughput 4448.14 MB/sec 16 procs

UDP should get even better results, its destructor being more complex,
since SOCK_USE_WRITE_QUEUE is not set (four atomic ops instead of one)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/core/dev.c b/net/core/dev.c
index e8041eb..acae5fe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1880,6 +1880,17 @@ static int dev_gso_segment(struct sk_buff *skb)
 	return 0;
 }
 
+/*
+ * Try to orphan skb early, right before transmission by the device.
+ * We cannot orphan skb if tx timestamp is requested, since
+ * drivers need to call skb_tstamp_tx() to send the timestamp.
+ */
+static inline void skb_orphan_try(struct sk_buff *skb)
+{
+	if (!skb_tx(skb)->flags)
+		skb_orphan(skb);
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1904,23 +1915,10 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(skb);
 
+		skb_orphan_try(skb);
 		rc = ops->ndo_start_xmit(skb, dev);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);
-		/*
-		 * TODO: if skb_orphan() was called by
-		 * dev->hard_start_xmit() (for example, the unmodified
-		 * igb driver does that; bnx2 doesn't), then
-		 * skb_tx_software_timestamp() will be unable to send
-		 * back the time stamp.
-		 *
-		 * How can this be prevented? Always create another
-		 * reference to the socket before calling
-		 * dev->hard_start_xmit()? Prevent that skb_orphan()
-		 * does anything in dev->hard_start_xmit() by clearing
-		 * the skb destructor before the call and restoring it
-		 * afterwards, then doing the skb_orphan() ourselves?
-		 */
 		return rc;
 	}
 
@@ -1938,6 +1936,7 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
+		skb_orphan_try(nskb);
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-16 22:18                         ` [PATCH net-next-2.6] net: Introduce skb_orphan_try() Eric Dumazet
@ 2010-04-18  9:46                           ` David Miller
  2010-04-21  6:08                             ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2010-04-18  9:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 17 Apr 2010 00:18:22 +0200

> Le jeudi 15 avril 2010 à 14:33 -0700, David Miller a écrit :
> 
>> If it's not legal to skb_orphan() here then it would not be legal for
>> the drivers to unconditionally skb_orphan(), which they do.
>> 
>> So either your test is unnecessary, or we have a big existing problem
>> :-)
> 
> I cooked following patch, introducing skb_orphan_try() helper, to
> document all known exceptions.

Looks good, applied, thanks Eric.

That timestamping issue, I bet there is some simple solution hiding
in the bushes for that one?

Hmmm, something like... there is a less transient piece of state
(perhaps in the socket) and the skb has a pointer to that piece of
state.  Then the driver writes into this pointed-to place rather
into some member of the skb itself.

Then all of these problems with referencing the skb metadata across
being passed to ->ndo_start_xmit() could go away I think.

> I have a possible followup for this patch :
> 
> Orphaning skbs earlier could also make dev_kfree_skb_irq() faster.
> Instead of queing skb into completion_queue and triggering
> NET_TX_SOFTIRQ, we would directly free an orphaned skb ?

Sounds great.  But I don't see dev_kfree_skb_irq() as much of a
performance priority, since all sane modern drivers free skbs from
softirq context (via NAPI).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-18  9:46                           ` David Miller
@ 2010-04-21  6:08                             ` Eric Dumazet
  2010-04-22  5:56                               ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-21  6:08 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le dimanche 18 avril 2010 à 02:46 -0700, David Miller a écrit :

> Looks good, applied, thanks Eric.

Hmm, looking at the GSO stuff, I believe we should not call
skb_orphan_try() on gso skbs ?

Thanks !

[PATCH net-next-2.6] net: Dont call skb_orphan_try() for GSO

At this point, skb->destructor is not the original one (stored in
DEV_GSO_CB(skb)->destructor)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/core/dev.c b/net/core/dev.c
index b31d5d6..200d1e8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1937,7 +1937,6 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
-		skb_orphan_try(nskb);
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)



^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-21  6:08                             ` Eric Dumazet
@ 2010-04-22  5:56                               ` David Miller
  2010-04-22  7:10                                 ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2010-04-22  5:56 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 21 Apr 2010 08:08:36 +0200

> Le dimanche 18 avril 2010 à 02:46 -0700, David Miller a écrit :
> 
>> Looks good, applied, thanks Eric.
> 
> Hmm, looking at the GSO stuff, I believe we should not call
> skb_orphan_try() on gso skbs ?

Right, I've applied this, thanks.

What we should probably do instead is call and NULL out the
DEV_GSO_CB() destructor.  Right?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-22  5:56                               ` David Miller
@ 2010-04-22  7:10                                 ` Eric Dumazet
  2010-04-22  7:16                                   ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-22  7:10 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit :

> Right, I've applied this, thanks.
> 
> What we should probably do instead is call and NULL out the
> DEV_GSO_CB() destructor.  Right?

Yes, probably, I'll take a look at this if you want.

Thanks



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-22  7:10                                 ` Eric Dumazet
@ 2010-04-22  7:16                                   ` David Miller
  2010-04-22  7:24                                     ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2010-04-22  7:16 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Apr 2010 09:10:33 +0200

> Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit :
> 
>> Right, I've applied this, thanks.
>> 
>> What we should probably do instead is call and NULL out the
>> DEV_GSO_CB() destructor.  Right?
> 
> Yes, probably, I'll take a look at this if you want.

It might look something like this:

diff --git a/net/core/dev.c b/net/core/dev.c
index 9bf1ccc..13241da 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1892,6 +1892,20 @@ static inline void skb_orphan_try(struct sk_buff *skb)
 		skb_orphan(skb);
 }
 
+/*
+ * GSO packets need to be handled specially because such packets
+ * hold the normal SKB destructor in a backup pointer.
+ */
+static inline void skb_orphan_try_gso(struct sk_buff *skb)
+{
+	if (!skb_tx(skb)->flags) {
+		if (DEV_GSO_CB(skb)->destructor)
+			DEV_GSO_CB(skb)->destructor(skb);
+		DEV_GSO_CB(skb)->destructor = NULL;
+		skb->sk = NULL;
+	}
+}
+
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1937,6 +1951,7 @@ gso:
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(nskb);
 
+		skb_orphan_try_gso(skb);
 		rc = ops->ndo_start_xmit(nskb, dev);
 		if (unlikely(rc != NETDEV_TX_OK)) {
 			if (rc & ~NETDEV_TX_MASK)

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-22  7:16                                   ` David Miller
@ 2010-04-22  7:24                                     ` Eric Dumazet
  2010-04-22  7:26                                       ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-22  7:24 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le jeudi 22 avril 2010 à 00:16 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 22 Apr 2010 09:10:33 +0200
> 
> > Le mercredi 21 avril 2010 à 22:56 -0700, David Miller a écrit :
> > 
> >> Right, I've applied this, thanks.
> >> 
> >> What we should probably do instead is call and NULL out the
> >> DEV_GSO_CB() destructor.  Right?
> > 
> > Yes, probably, I'll take a look at this if you want.
> 
> It might look something like this:
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 9bf1ccc..13241da 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1892,6 +1892,20 @@ static inline void skb_orphan_try(struct sk_buff *skb)
>  		skb_orphan(skb);
>  }
>  
> +/*
> + * GSO packets need to be handled specially because such packets
> + * hold the normal SKB destructor in a backup pointer.
> + */
> +static inline void skb_orphan_try_gso(struct sk_buff *skb)
> +{
> +	if (!skb_tx(skb)->flags) {
> +		if (DEV_GSO_CB(skb)->destructor)
> +			DEV_GSO_CB(skb)->destructor(skb);
> +		DEV_GSO_CB(skb)->destructor = NULL;
> +		skb->sk = NULL;
> +	}
> +}
> +
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  			struct netdev_queue *txq)
>  {
> @@ -1937,6 +1951,7 @@ gso:
>  		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>  			skb_dst_drop(nskb);
>  
> +		skb_orphan_try_gso(skb);
>  		rc = ops->ndo_start_xmit(nskb, dev);
>  		if (unlikely(rc != NETDEV_TX_OK)) {
>  			if (rc & ~NETDEV_TX_MASK)

Hmm... are you sure we want to call destructor for each skb ?

Should'nt we do it before initial skb is split ?




^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-22  7:24                                     ` Eric Dumazet
@ 2010-04-22  7:26                                       ` David Miller
  2010-04-22  7:33                                         ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2010-04-22  7:26 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Apr 2010 09:24:05 +0200

> Hmm... are you sure we want to call destructor for each skb ?
> 
> Should'nt we do it before initial skb is split ?

Good idea, therefore you mean something like this?

diff --git a/net/core/dev.c b/net/core/dev.c
index 3ba774b..f3c3885 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb)
 	int features = dev->features & ~(illegal_highdma(dev, skb) ?
 					 NETIF_F_SG : 0);
 
+	skb_orphan_try(skb);
 	segs = skb_gso_segment(skb, features);
 
 	/* Verifying header integrity only. */

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-22  7:26                                       ` David Miller
@ 2010-04-22  7:33                                         ` Eric Dumazet
  2010-04-22  7:41                                           ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-22  7:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le jeudi 22 avril 2010 à 00:26 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 22 Apr 2010 09:24:05 +0200
> 
> > Hmm... are you sure we want to call destructor for each skb ?
> > 
> > Should'nt we do it before initial skb is split ?
> 
> Good idea, therefore you mean something like this?
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3ba774b..f3c3885 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb)
>  	int features = dev->features & ~(illegal_highdma(dev, skb) ?
>  					 NETIF_F_SG : 0);
>  
> +	skb_orphan_try(skb);
>  	segs = skb_gso_segment(skb, features);
>  
>  	/* Verifying header integrity only. */

Yes, it seems better.

What about the 

if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
	skb_dst_drop(skb);

This thing might also be moved before the split, since split probably
clone all dst ?



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-22  7:33                                         ` Eric Dumazet
@ 2010-04-22  7:41                                           ` David Miller
  2010-04-22  7:47                                             ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2010-04-22  7:41 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Apr 2010 09:33:57 +0200

> Le jeudi 22 avril 2010 à 00:26 -0700, David Miller a écrit :
>> @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb)
>>  	int features = dev->features & ~(illegal_highdma(dev, skb) ?
>>  					 NETIF_F_SG : 0);
>>  
>> +	skb_orphan_try(skb);
>>  	segs = skb_gso_segment(skb, features);
>>  
>>  	/* Verifying header integrity only. */
> 
> Yes, it seems better.
> 
> What about the 
> 
> if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
> 	skb_dst_drop(skb);
> 
> This thing might also be moved before the split, since split probably
> clone all dst ?

Good catch, agreed.

diff --git a/net/core/dev.c b/net/core/dev.c
index 3ba774b..4f897e2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1851,6 +1851,17 @@ static void dev_gso_skb_destructor(struct sk_buff *skb)
 		cb->destructor(skb);
 }
 
+/*
+ * Try to orphan skb early, right before transmission by the device.
+ * We cannot orphan skb if tx timestamp is requested, since
+ * drivers need to call skb_tstamp_tx() to send the timestamp.
+ */
+static inline void skb_orphan_try(struct sk_buff *skb)
+{
+	if (!skb_tx(skb)->flags)
+		skb_orphan(skb);
+}
+
 /**
  *	dev_gso_segment - Perform emulated hardware segmentation on skb.
  *	@skb: buffer to segment
@@ -1865,6 +1876,7 @@ static int dev_gso_segment(struct sk_buff *skb)
 	int features = dev->features & ~(illegal_highdma(dev, skb) ?
 					 NETIF_F_SG : 0);
 
+	skb_orphan_try(skb);
 	segs = skb_gso_segment(skb, features);
 
 	/* Verifying header integrity only. */
@@ -1881,17 +1893,6 @@ static int dev_gso_segment(struct sk_buff *skb)
 	return 0;
 }
 
-/*
- * Try to orphan skb early, right before transmission by the device.
- * We cannot orphan skb if tx timestamp is requested, since
- * drivers need to call skb_tstamp_tx() to send the timestamp.
- */
-static inline void skb_orphan_try(struct sk_buff *skb)
-{
-	if (!skb_tx(skb)->flags)
-		skb_orphan(skb);
-}
-
 int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			struct netdev_queue *txq)
 {
@@ -1902,13 +1903,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (!list_empty(&ptype_all))
 			dev_queue_xmit_nit(skb, dev);
 
-		if (netif_needs_gso(dev, skb)) {
-			if (unlikely(dev_gso_segment(skb)))
-				goto out_kfree_skb;
-			if (skb->next)
-				goto gso;
-		}
-
 		/*
 		 * If device doesnt need skb->dst, release it right now while
 		 * its hot in this cpu cache
@@ -1916,6 +1910,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
 			skb_dst_drop(skb);
 
+		if (netif_needs_gso(dev, skb)) {
+			if (unlikely(dev_gso_segment(skb)))
+				goto out_kfree_skb;
+			if (skb->next)
+				goto gso;
+		}
+
 		skb_orphan_try(skb);
 		rc = ops->ndo_start_xmit(skb, dev);
 		if (rc == NETDEV_TX_OK)

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-22  7:41                                           ` David Miller
@ 2010-04-22  7:47                                             ` Eric Dumazet
  2010-04-22  7:54                                               ` David Miller
  0 siblings, 1 reply; 37+ messages in thread
From: Eric Dumazet @ 2010-04-22  7:47 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le jeudi 22 avril 2010 à 00:41 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 22 Apr 2010 09:33:57 +0200
> 
> > Le jeudi 22 avril 2010 à 00:26 -0700, David Miller a écrit :
> >> @@ -1865,6 +1865,7 @@ static int dev_gso_segment(struct sk_buff *skb)
> >>  	int features = dev->features & ~(illegal_highdma(dev, skb) ?
> >>  					 NETIF_F_SG : 0);
> >>  
> >> +	skb_orphan_try(skb);
> >>  	segs = skb_gso_segment(skb, features);
> >>  
> >>  	/* Verifying header integrity only. */
> > 
> > Yes, it seems better.
> > 
> > What about the 
> > 
> > if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
> > 	skb_dst_drop(skb);
> > 
> > This thing might also be moved before the split, since split probably
> > clone all dst ?
> 
> Good catch, agreed.
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3ba774b..4f897e2 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1851,6 +1851,17 @@ static void dev_gso_skb_destructor(struct sk_buff *skb)
>  		cb->destructor(skb);
>  }
>  
> +/*
> + * Try to orphan skb early, right before transmission by the device.
> + * We cannot orphan skb if tx timestamp is requested, since
> + * drivers need to call skb_tstamp_tx() to send the timestamp.
> + */
> +static inline void skb_orphan_try(struct sk_buff *skb)
> +{
> +	if (!skb_tx(skb)->flags)
> +		skb_orphan(skb);
> +}
> +
>  /**
>   *	dev_gso_segment - Perform emulated hardware segmentation on skb.
>   *	@skb: buffer to segment
> @@ -1865,6 +1876,7 @@ static int dev_gso_segment(struct sk_buff *skb)
>  	int features = dev->features & ~(illegal_highdma(dev, skb) ?
>  					 NETIF_F_SG : 0);
>  
> +	skb_orphan_try(skb);
>  	segs = skb_gso_segment(skb, features);
>  
>  	/* Verifying header integrity only. */
> @@ -1881,17 +1893,6 @@ static int dev_gso_segment(struct sk_buff *skb)
>  	return 0;
>  }
>  
> -/*
> - * Try to orphan skb early, right before transmission by the device.
> - * We cannot orphan skb if tx timestamp is requested, since
> - * drivers need to call skb_tstamp_tx() to send the timestamp.
> - */
> -static inline void skb_orphan_try(struct sk_buff *skb)
> -{
> -	if (!skb_tx(skb)->flags)
> -		skb_orphan(skb);
> -}
> -
>  int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  			struct netdev_queue *txq)
>  {
> @@ -1902,13 +1903,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  		if (!list_empty(&ptype_all))
>  			dev_queue_xmit_nit(skb, dev);
>  
> -		if (netif_needs_gso(dev, skb)) {
> -			if (unlikely(dev_gso_segment(skb)))
> -				goto out_kfree_skb;
> -			if (skb->next)
> -				goto gso;
> -		}
> -
>  		/*
>  		 * If device doesnt need skb->dst, release it right now while
>  		 * its hot in this cpu cache
> @@ -1916,6 +1910,13 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  		if (dev->priv_flags & IFF_XMIT_DST_RELEASE)
>  			skb_dst_drop(skb);
>  
> +		if (netif_needs_gso(dev, skb)) {
> +			if (unlikely(dev_gso_segment(skb)))
> +				goto out_kfree_skb;
> +			if (skb->next)
> +				goto gso;
> +		}
> +
>  		skb_orphan_try(skb);
>  		rc = ops->ndo_start_xmit(skb, dev);
>  		if (rc == NETDEV_TX_OK)

You could have one skb_orphan_try() call before the

if (netif_needs_gso(dev, skb)) {

and remove it from dev_gso_segment() ?



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-22  7:47                                             ` Eric Dumazet
@ 2010-04-22  7:54                                               ` David Miller
  2010-04-22  7:59                                                 ` Eric Dumazet
  0 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2010-04-22  7:54 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 22 Apr 2010 09:47:03 +0200

> You could have one skb_orphan_try() call before the
> 
> if (netif_needs_gso(dev, skb)) {
> 
> and remove it from dev_gso_segment() ?

Yes, that's much more concise.  This should be ready to go:

net: Orphan and de-dst skbs earlier in xmit path.

This way GSO packets don't get handled differently.

With help from Eric Dumazet.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/dev.c b/net/core/dev.c
index 3ba774b..a4a7c36 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1902,13 +1902,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 		if (!list_empty(&ptype_all))
 			dev_queue_xmit_nit(skb, dev);
 
-		if (netif_needs_gso(dev, skb)) {
-			if (unlikely(dev_gso_segment(skb)))
-				goto out_kfree_skb;
-			if (skb->next)
-				goto gso;
-		}
-
 		/*
 		 * If device doesnt need skb->dst, release it right now while
 		 * its hot in this cpu cache
@@ -1917,6 +1910,14 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 			skb_dst_drop(skb);
 
 		skb_orphan_try(skb);
+
+		if (netif_needs_gso(dev, skb)) {
+			if (unlikely(dev_gso_segment(skb)))
+				goto out_kfree_skb;
+			if (skb->next)
+				goto gso;
+		}
+
 		rc = ops->ndo_start_xmit(skb, dev);
 		if (rc == NETDEV_TX_OK)
 			txq_trans_update(txq);

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH net-next-2.6] net: Introduce skb_orphan_try()
  2010-04-22  7:54                                               ` David Miller
@ 2010-04-22  7:59                                                 ` Eric Dumazet
  0 siblings, 0 replies; 37+ messages in thread
From: Eric Dumazet @ 2010-04-22  7:59 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

Le jeudi 22 avril 2010 à 00:54 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 22 Apr 2010 09:47:03 +0200
> 
> > You could have one skb_orphan_try() call before the
> > 
> > if (netif_needs_gso(dev, skb)) {
> > 
> > and remove it from dev_gso_segment() ?
> 
> Yes, that's much more concise.  This should be ready to go:
> 
> net: Orphan and de-dst skbs earlier in xmit path.
> 
> This way GSO packets don't get handled differently.
> 
> With help from Eric Dumazet.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 3ba774b..a4a7c36 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -1902,13 +1902,6 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  		if (!list_empty(&ptype_all))
>  			dev_queue_xmit_nit(skb, dev);
>  
> -		if (netif_needs_gso(dev, skb)) {
> -			if (unlikely(dev_gso_segment(skb)))
> -				goto out_kfree_skb;
> -			if (skb->next)
> -				goto gso;
> -		}
> -
>  		/*
>  		 * If device doesnt need skb->dst, release it right now while
>  		 * its hot in this cpu cache
> @@ -1917,6 +1910,14 @@ int dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  			skb_dst_drop(skb);
>  
>  		skb_orphan_try(skb);
> +
> +		if (netif_needs_gso(dev, skb)) {
> +			if (unlikely(dev_gso_segment(skb)))
> +				goto out_kfree_skb;
> +			if (skb->next)
> +				goto gso;
> +		}
> +
>  		rc = ops->ndo_start_xmit(skb, dev);
>  		if (rc == NETDEV_TX_OK)
>  			txq_trans_update(txq);

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Thanks David !



^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: NULL pointer dereference panic in stable (2.6.33.2), amd64
  2010-04-15 10:37                     ` Eric Dumazet
@ 2010-04-29 10:50                       ` Denys Fedorysychenko
  0 siblings, 0 replies; 37+ messages in thread
From: Denys Fedorysychenko @ 2010-04-29 10:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, krkumar2, netdev

On Thursday 15 April 2010 13:37:10 Eric Dumazet wrote:
> Le jeudi 15 avril 2010 à 12:11 +0300, Denys Fedorysychenko a écrit :
> > Btw i have application using tun.
> 
> Could you add following sanity test to catch the error ?
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index fa8b476..b67274a 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -988,6 +988,7 @@ static inline
>  struct netdev_queue *netdev_get_tx_queue(const struct net_device *dev,
>  					 unsigned int index)
>  {
> +	WARN_ON(index >= dev->num_tx_queues);
>  	return &dev->_tx[index];
>  }
> 
Very sorry for being late, just i found way to stabilize kernel for me and to 
solve my personal life issues. It took 2 weeks...
I will try this patch, but i'm sure i dont have multiqueue card there.

I had shaper on eth0.33 and eth0 so i disable all offloading except 
checksumming there. Recently i found  that on some interfaces (where is no 
shaper) gso left on. And yes, probably some traffic was queued, then route 
changed, and maybe it went from gso off interface to gso on... 
It can be just pure luck, that i dont have crashes anymore, but maybe trick is 
in gso...

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2010-04-30 16:47 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-11 20:38 NULL pointer dereference panic in stable (2.6.33.2), amd64 Denys Fedorysychenko
2010-04-11 22:35 ` Eric Dumazet
2010-04-11 23:04   ` Denys Fedorysychenko
2010-04-11 23:11     ` Eric Dumazet
2010-04-11 23:36       ` Denys Fedorysychenko
2010-04-12  3:38   ` Krishna Kumar2
2010-04-12  6:01     ` Eric Dumazet
2010-04-12  7:18       ` Eric Dumazet
2010-04-12  7:36         ` David Miller
2010-04-15  6:52         ` David Miller
2010-04-15  8:02           ` Eric Dumazet
2010-04-15  8:26             ` David Miller
2010-04-15  8:51               ` Eric Dumazet
2010-04-15  9:06                 ` David Miller
2010-04-15  9:11                   ` Denys Fedorysychenko
2010-04-15 10:37                     ` Eric Dumazet
2010-04-29 10:50                       ` Denys Fedorysychenko
2010-04-15 20:30                   ` Eric Dumazet
2010-04-15 20:46                     ` Eric Dumazet
2010-04-15 21:33                       ` David Miller
2010-04-16 22:18                         ` [PATCH net-next-2.6] net: Introduce skb_orphan_try() Eric Dumazet
2010-04-18  9:46                           ` David Miller
2010-04-21  6:08                             ` Eric Dumazet
2010-04-22  5:56                               ` David Miller
2010-04-22  7:10                                 ` Eric Dumazet
2010-04-22  7:16                                   ` David Miller
2010-04-22  7:24                                     ` Eric Dumazet
2010-04-22  7:26                                       ` David Miller
2010-04-22  7:33                                         ` Eric Dumazet
2010-04-22  7:41                                           ` David Miller
2010-04-22  7:47                                             ` Eric Dumazet
2010-04-22  7:54                                               ` David Miller
2010-04-22  7:59                                                 ` Eric Dumazet
2010-04-12  7:54       ` NULL pointer dereference panic in stable (2.6.33.2), amd64 Krishna Kumar2
2010-04-12  9:31         ` Eric Dumazet
2010-04-12 16:11           ` Denys Fedorysychenko
2010-04-12 20:09             ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.