* kernel panic in skb_copy_bits @ 2013-06-27 2:58 ` Joe Jin 0 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-27 2:58 UTC (permalink / raw) To: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li Hi, When we do fail over test with iscsi + multipath by reset the switches on OVM(2.6.39) we hit the panic: BUG: unable to handle kernel paging request at ffff88006d9e8d48 IP: [<ffffffff812605bb>] memcpy+0xb/0x120 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 Oops: 0000 [#1] SMP CPU 7 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3 j! bd mbcache Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b CR0: 000000008005003b CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) Stack: ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c Call Trace: <IRQ> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 [<ffffffff8142f173>] skb_copy+0xf3/0x120 [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 [<ffffffff81511d3c>] call_softirq+0x1c/0x30 [<ffffffff810172e5>] do_softirq+0x65/0xa0 [<ffffffff8107656b>] irq_exit+0xab/0xc0 [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff812605bb>] memcpy+0xb/0x120 RSP <ffff8801003c3d58> CR2: ffff88006d9e8d48 Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 Author: Frank Blaschka <frank.blaschka@de.ibm.com> Date: Mon Mar 3 12:16:04 2008 -0800 [NET]: Fix race in generic address resolution. neigh_update sends skb from neigh->arp_queue while neigh_timer_handler has increased skbs refcount and calls solicit with the skb. neigh_timer_handler should not increase skbs refcount but make a copy of the skb and do solicit with the copy. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> So can you please give some details of the race? per vmcore seems like the skb data be freed, I suspected skb_get() lost at somewhere? I reverted above commit the panic not occurred during our testing. Any input will appreciate! Best Regards, Joe ^ permalink raw reply [flat|nested] 64+ messages in thread
* kernel panic in skb_copy_bits @ 2013-06-27 2:58 ` Joe Jin 0 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-27 2:58 UTC (permalink / raw) To: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li Hi, When we do fail over test with iscsi + multipath by reset the switches on OVM(2.6.39) we hit the panic: BUG: unable to handle kernel paging request at ffff88006d9e8d48 IP: [<ffffffff812605bb>] memcpy+0xb/0x120 PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 Oops: 0000 [#1] SMP CPU 7 Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3 j! bd mbcache Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b CR0: 000000008005003b CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) Stack: ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c Call Trace: <IRQ> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 [<ffffffff8142f173>] skb_copy+0xf3/0x120 [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 [<ffffffff81511d3c>] call_softirq+0x1c/0x30 [<ffffffff810172e5>] do_softirq+0x65/0xa0 [<ffffffff8107656b>] irq_exit+0xab/0xc0 [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff812605bb>] memcpy+0xb/0x120 RSP <ffff8801003c3d58> CR2: ffff88006d9e8d48 Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 Author: Frank Blaschka <frank.blaschka@de.ibm.com> Date: Mon Mar 3 12:16:04 2008 -0800 [NET]: Fix race in generic address resolution. neigh_update sends skb from neigh->arp_queue while neigh_timer_handler has increased skbs refcount and calls solicit with the skb. neigh_timer_handler should not increase skbs refcount but make a copy of the skb and do solicit with the copy. Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> So can you please give some details of the race? per vmcore seems like the skb data be freed, I suspected skb_get() lost at somewhere? I reverted above commit the panic not occurred during our testing. Any input will appreciate! Best Regards, Joe ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-27 2:58 ` Joe Jin @ 2013-06-27 5:31 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-27 5:31 UTC (permalink / raw) To: Joe Jin; +Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: > Hi, > > When we do fail over test with iscsi + multipath by reset the switches > on OVM(2.6.39) we hit the panic: > > BUG: unable to handle kernel paging request at ffff88006d9e8d48 > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > Oops: 0000 [#1] SMP > CPU 7 > Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3 j! > bd mbcache > > > Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 > RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 > RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 > RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 > R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 > R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 > FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) > Stack: > ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 > 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 > ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c > Call Trace: > <IRQ> > [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 > [<ffffffff8142f173>] skb_copy+0xf3/0x120 > [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 > [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > [<ffffffff81511d3c>] call_softirq+0x1c/0x30 > [<ffffffff810172e5>] do_softirq+0x65/0xa0 > [<ffffffff8107656b>] irq_exit+0xab/0xc0 > [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 > <EOI> > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 > Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP <ffff8801003c3d58> > CR2: ffff88006d9e8d48 > > Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour > history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: > > commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 > Author: Frank Blaschka <frank.blaschka@de.ibm.com> > Date: Mon Mar 3 12:16:04 2008 -0800 > > [NET]: Fix race in generic address resolution. > > neigh_update sends skb from neigh->arp_queue while neigh_timer_handler > has increased skbs refcount and calls solicit with the > skb. neigh_timer_handler should not increase skbs refcount but make a > copy of the skb and do solicit with the copy. > > Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > So can you please give some details of the race? per vmcore seems like the skb data > be freed, I suspected skb_get() lost at somewhere? > I reverted above commit the panic not occurred during our testing. > > Any input will appreciate! Well, fact is that your crash is happening in skb_copy(). Frank patch is OK. I suspect using skb_clone() would work too, so if these skb were fclone ready, chance of an GFP_ATOMIC allocation error would be smaller. So something is providing a wrong skb at the very beginning. You could try to do a early skb_copy to catch the bug and see in the stack trace what produced this buggy skb. diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 5c56b21..a7a51fd 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); } skb_dst_force(skb); + kfree_skb(skb_copy(skb, GFP_ATOMIC)); __skb_queue_tail(&neigh->arp_queue, skb); neigh->arp_queue_len_bytes += skb->truesize; } ^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits @ 2013-06-27 5:31 ` Eric Dumazet 0 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-27 5:31 UTC (permalink / raw) To: Joe Jin; +Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: > Hi, > > When we do fail over test with iscsi + multipath by reset the switches > on OVM(2.6.39) we hit the panic: > > BUG: unable to handle kernel paging request at ffff88006d9e8d48 > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > Oops: 0000 [#1] SMP > CPU 7 > Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core he d dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3 j! > bd mbcache > > > Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 > RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 > RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 > RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 > R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 > R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 > FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) > Stack: > ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 > 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 > ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c > Call Trace: > <IRQ> > [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 > [<ffffffff8142f173>] skb_copy+0xf3/0x120 > [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 > [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > [<ffffffff81511d3c>] call_softirq+0x1c/0x30 > [<ffffffff810172e5>] do_softirq+0x65/0xa0 > [<ffffffff8107656b>] irq_exit+0xab/0xc0 > [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 > <EOI> > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 > Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP <ffff8801003c3d58> > CR2: ffff88006d9e8d48 > > Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour > history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: > > commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 > Author: Frank Blaschka <frank.blaschka@de.ibm.com> > Date: Mon Mar 3 12:16:04 2008 -0800 > > [NET]: Fix race in generic address resolution. > > neigh_update sends skb from neigh->arp_queue while neigh_timer_handler > has increased skbs refcount and calls solicit with the > skb. neigh_timer_handler should not increase skbs refcount but make a > copy of the skb and do solicit with the copy. > > Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > > So can you please give some details of the race? per vmcore seems like the skb data > be freed, I suspected skb_get() lost at somewhere? > I reverted above commit the panic not occurred during our testing. > > Any input will appreciate! Well, fact is that your crash is happening in skb_copy(). Frank patch is OK. I suspect using skb_clone() would work too, so if these skb were fclone ready, chance of an GFP_ATOMIC allocation error would be smaller. So something is providing a wrong skb at the very beginning. You could try to do a early skb_copy to catch the bug and see in the stack trace what produced this buggy skb. diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 5c56b21..a7a51fd 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); } skb_dst_force(skb); + kfree_skb(skb_copy(skb, GFP_ATOMIC)); __skb_queue_tail(&neigh->arp_queue, skb); neigh->arp_queue_len_bytes += skb->truesize; } ^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-27 5:31 ` Eric Dumazet @ 2013-06-27 7:15 ` Joe Jin -1 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-27 7:15 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li Hi Eric, Thanks for you response, will test it and get back to you. Regards, Joe On 06/27/13 13:31, Eric Dumazet wrote: > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: >> Hi, >> >> When we do fail over test with iscsi + multipath by reset the switches >> on OVM(2.6.39) we hit the panic: >> >> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >> Oops: 0000 [#1] SMP >> CPU 7 >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! 3! > j! >> bd mbcache >> >> >> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 >> RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 >> RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 >> RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 >> RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 >> R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 >> R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 >> FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 >> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b >> CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) >> Stack: >> ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 >> 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 >> ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c >> Call Trace: >> <IRQ> >> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 >> [<ffffffff8142f173>] skb_copy+0xf3/0x120 >> [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 >> [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 >> [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 >> [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 >> [<ffffffff81511d3c>] call_softirq+0x1c/0x30 >> [<ffffffff810172e5>] do_softirq+0x65/0xa0 >> [<ffffffff8107656b>] irq_exit+0xab/0xc0 >> [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 >> [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 >> <EOI> >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 >> [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 >> [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 >> [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 >> [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c >> RIP [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP <ffff8801003c3d58> >> CR2: ffff88006d9e8d48 >> >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: >> >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 >> Author: Frank Blaschka <frank.blaschka@de.ibm.com> >> Date: Mon Mar 3 12:16:04 2008 -0800 >> >> [NET]: Fix race in generic address resolution. >> >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler >> has increased skbs refcount and calls solicit with the >> skb. neigh_timer_handler should not increase skbs refcount but make a >> copy of the skb and do solicit with the copy. >> >> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> So can you please give some details of the race? per vmcore seems like the skb data >> be freed, I suspected skb_get() lost at somewhere? >> I reverted above commit the panic not occurred during our testing. >> >> Any input will appreciate! > > Well, fact is that your crash is happening in skb_copy(). > > Frank patch is OK. I suspect using skb_clone() would work too, > so if these skb were fclone ready, chance of an GFP_ATOMIC allocation > error would be smaller. > > So something is providing a wrong skb at the very beginning. > > You could try to do a early skb_copy to catch the bug and see in the > stack trace what produced this buggy skb. > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 5c56b21..a7a51fd 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) > NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); > } > skb_dst_force(skb); > + kfree_skb(skb_copy(skb, GFP_ATOMIC)); > __skb_queue_tail(&neigh->arp_queue, skb); > neigh->arp_queue_len_bytes += skb->truesize; > } > > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits @ 2013-06-27 7:15 ` Joe Jin 0 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-27 7:15 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li Hi Eric, Thanks for you response, will test it and get back to you. Regards, Joe On 06/27/13 13:31, Eric Dumazet wrote: > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: >> Hi, >> >> When we do fail over test with iscsi + multipath by reset the switches >> on OVM(2.6.39) we hit the panic: >> >> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >> Oops: 0000 [#1] SMP >> CPU 7 >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core h ed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! 3! > j! >> bd mbcache >> >> >> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 >> RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 >> RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 >> RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 >> RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 >> R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 >> R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 >> FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 >> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b >> CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) >> Stack: >> ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 >> 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 >> ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c >> Call Trace: >> <IRQ> >> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 >> [<ffffffff8142f173>] skb_copy+0xf3/0x120 >> [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 >> [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 >> [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 >> [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 >> [<ffffffff81511d3c>] call_softirq+0x1c/0x30 >> [<ffffffff810172e5>] do_softirq+0x65/0xa0 >> [<ffffffff8107656b>] irq_exit+0xab/0xc0 >> [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 >> [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 >> <EOI> >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 >> [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 >> [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 >> [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 >> [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c >> RIP [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP <ffff8801003c3d58> >> CR2: ffff88006d9e8d48 >> >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: >> >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 >> Author: Frank Blaschka <frank.blaschka@de.ibm.com> >> Date: Mon Mar 3 12:16:04 2008 -0800 >> >> [NET]: Fix race in generic address resolution. >> >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler >> has increased skbs refcount and calls solicit with the >> skb. neigh_timer_handler should not increase skbs refcount but make a >> copy of the skb and do solicit with the copy. >> >> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> So can you please give some details of the race? per vmcore seems like the skb data >> be freed, I suspected skb_get() lost at somewhere? >> I reverted above commit the panic not occurred during our testing. >> >> Any input will appreciate! > > Well, fact is that your crash is happening in skb_copy(). > > Frank patch is OK. I suspect using skb_clone() would work too, > so if these skb were fclone ready, chance of an GFP_ATOMIC allocation > error would be smaller. > > So something is providing a wrong skb at the very beginning. > > You could try to do a early skb_copy to catch the bug and see in the > stack trace what produced this buggy skb. > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 5c56b21..a7a51fd 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) > NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); > } > skb_dst_force(skb); > + kfree_skb(skb_copy(skb, GFP_ATOMIC)); > __skb_queue_tail(&neigh->arp_queue, skb); > neigh->arp_queue_len_bytes += skb->truesize; > } > > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-27 5:31 ` Eric Dumazet @ 2013-06-28 4:17 ` Joe Jin -1 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-28 4:17 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen developer as well. On 06/27/13 13:31, Eric Dumazet wrote: > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: >> Hi, >> >> When we do fail over test with iscsi + multipath by reset the switches >> on OVM(2.6.39) we hit the panic: >> >> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >> Oops: 0000 [#1] SMP >> CPU 7 >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! 3! > j! >> bd mbcache >> >> >> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 >> RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 >> RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 >> RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 >> RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 >> R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 >> R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 >> FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 >> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b >> CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) >> Stack: >> ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 >> 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 >> ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c >> Call Trace: >> <IRQ> >> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 >> [<ffffffff8142f173>] skb_copy+0xf3/0x120 >> [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 >> [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 >> [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 >> [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 >> [<ffffffff81511d3c>] call_softirq+0x1c/0x30 >> [<ffffffff810172e5>] do_softirq+0x65/0xa0 >> [<ffffffff8107656b>] irq_exit+0xab/0xc0 >> [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 >> [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 >> <EOI> >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 >> [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 >> [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 >> [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 >> [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c >> RIP [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP <ffff8801003c3d58> >> CR2: ffff88006d9e8d48 >> >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: >> >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 >> Author: Frank Blaschka <frank.blaschka@de.ibm.com> >> Date: Mon Mar 3 12:16:04 2008 -0800 >> >> [NET]: Fix race in generic address resolution. >> >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler >> has increased skbs refcount and calls solicit with the >> skb. neigh_timer_handler should not increase skbs refcount but make a >> copy of the skb and do solicit with the copy. >> >> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> So can you please give some details of the race? per vmcore seems like the skb data >> be freed, I suspected skb_get() lost at somewhere? >> I reverted above commit the panic not occurred during our testing. >> >> Any input will appreciate! > > Well, fact is that your crash is happening in skb_copy(). > > Frank patch is OK. I suspect using skb_clone() would work too, > so if these skb were fclone ready, chance of an GFP_ATOMIC allocation > error would be smaller. > > So something is providing a wrong skb at the very beginning. > > You could try to do a early skb_copy to catch the bug and see in the > stack trace what produced this buggy skb. > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 5c56b21..a7a51fd 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) > NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); > } > skb_dst_force(skb); > + kfree_skb(skb_copy(skb, GFP_ATOMIC)); > __skb_queue_tail(&neigh->arp_queue, skb); > neigh->arp_queue_len_bytes += skb->truesize; > } > > BUG: unable to handle kernel paging request at ffff8800488db8dc IP: [<ffffffff812605bb>] memcpy+0xb/0x120 PGD 1796067 PUD 20e5067 PMD 212a067 PTE 0 Oops: 0000 [#1] SMP CPU 13 Modules linked in: ocfs2 jbd2 xen_blkback xen_netback xen_gntdev xen_evtchn netconsole i2c_dev i2c_core ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs lockd sunrpc dm_round_robin dm_multipath bridge stp llc bonding be2iscsi iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc hed acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport serio_raw ixgbe hpilo tg3 hpwdt dca snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd iTCO_wdt iTCO_vendor_support soundcore snd_page_alloc pcspkr pata_acpi ata_generic dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ata_piix sg shpchp hpsa cciss sd_mod crc_t10dif ext3 jbd mbcache Pid: 0, comm: swapper Not tainted 2.6.39-300.32.1.el5uek.bug16929255v5 #1 HP ProLiant DL360p Gen8 RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 RSP: e02b:ffff88005a9a3b68 EFLAGS: 00010202 RAX: ffff8800200f0280 RBX: 0000000000000724 RCX: 00000000000000e4 RDX: 0000000000000004 RSI: ffff8800488db8dc RDI: ffff8800200f0280 RBP: ffff88005a9a3bd0 R08: 0000000000000004 R09: ffff880052824980 R10: 0000000000000000 R11: 0000000000015048 R12: 0000000000000034 R13: 0000000000000034 R14: 00000000000022f4 R15: ffff880021208ab0 FS: 00007fe8737c96e0(0000) GS:ffff88005a9a0000(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b CR0: 000000008005003b CR2: ffff8800488db8dc CR3: 000000004fb38000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff880054d36000, task ffff880054d343c0) Stack: ffffffff8142dac7 0000000000000000 00000000ffffffff ffff8800200f0280 0000075800000000 0000000000000724 ffff880054d36000 0000000000000000 00000000fffffdb4 ffff880052824980 ffff880021208ab0 000000000000024c Call Trace: <IRQ> [<ffffffff8142dac7>] ? skb_copy_bits+0x167/0x290 [<ffffffff8142f0b5>] skb_copy+0x85/0xb0 [<ffffffff8144864d>] __neigh_event_send+0x18d/0x200 [<ffffffff81449a42>] neigh_resolve_output+0x162/0x1b0 [<ffffffff81477046>] ip_finish_output+0x146/0x320 [<ffffffff814754a5>] ip_output+0x85/0xd0 [<ffffffff814758d9>] ip_local_out+0x29/0x30 [<ffffffff814761e0>] ip_queue_xmit+0x1c0/0x3d0 [<ffffffff8148d3ef>] tcp_transmit_skb+0x40f/0x520 [<ffffffff8148e5ff>] tcp_retransmit_skb+0x16f/0x2e0 [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 [<ffffffff814905ad>] tcp_retransmit_timer+0x18d/0x4a0 [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 [<ffffffff81490994>] tcp_write_timer+0xd4/0x100 [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 [<ffffffff81511b7c>] call_softirq+0x1c/0x30 [<ffffffff810172e5>] do_softirq+0x65/0xa0 [<ffffffff8107656b>] irq_exit+0xab/0xc0 [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 [<ffffffff81511bce>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff8100a0d0>] ? xen_safe_halt+0x10/0x20 [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 [<ffffffff8100a8e9>] ? xen_irq_enable_direct_reloc+0x4/0x4 [<ffffffff814f7a2e>] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff812605bb>] memcpy+0xb/0x120 Per vmcore, the socket info as below: ------------------------------------------------------------------------------ <struct tcp_sock 0xffff88004d344e00> TCP tcp 10.1.1.11:42147 10.1.1.21:3260 FIN_WAIT1 windows: rcv=122124, snd=65535 advmss=8948 rcv_ws=1 snd_ws=0 nonagle=1 sack_ok=0 tstamp_ok=1 rmem_alloc=0, wmem_alloc=10229 rx_queue=0, tx_queue=149765 rcvbuf=262142, sndbuf=262142 rcv_tstamp=51.4 s, lsndtime=0.0 s ago -- Retransmissions -- retransmits=7, ca_state=TCP_CA_Disorder ------------------------------------------------------------------------------ When sock status move to FIN_WAIT1, will it cleanup all skb or no? Thanks, Joe ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits @ 2013-06-28 4:17 ` Joe Jin 0 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-28 4:17 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen developer as well. On 06/27/13 13:31, Eric Dumazet wrote: > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: >> Hi, >> >> When we do fail over test with iscsi + multipath by reset the switches >> on OVM(2.6.39) we hit the panic: >> >> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >> Oops: 0000 [#1] SMP >> CPU 7 >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core h ed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! 3! > j! >> bd mbcache >> >> >> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 >> RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 >> RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 >> RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 >> RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 >> R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 >> R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 >> FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 >> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b >> CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) >> Stack: >> ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 >> 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 >> ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c >> Call Trace: >> <IRQ> >> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 >> [<ffffffff8142f173>] skb_copy+0xf3/0x120 >> [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 >> [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 >> [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 >> [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 >> [<ffffffff81511d3c>] call_softirq+0x1c/0x30 >> [<ffffffff810172e5>] do_softirq+0x65/0xa0 >> [<ffffffff8107656b>] irq_exit+0xab/0xc0 >> [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 >> [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 >> <EOI> >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 >> [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 >> [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 >> [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 >> [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c >> RIP [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP <ffff8801003c3d58> >> CR2: ffff88006d9e8d48 >> >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: >> >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 >> Author: Frank Blaschka <frank.blaschka@de.ibm.com> >> Date: Mon Mar 3 12:16:04 2008 -0800 >> >> [NET]: Fix race in generic address resolution. >> >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler >> has increased skbs refcount and calls solicit with the >> skb. neigh_timer_handler should not increase skbs refcount but make a >> copy of the skb and do solicit with the copy. >> >> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> So can you please give some details of the race? per vmcore seems like the skb data >> be freed, I suspected skb_get() lost at somewhere? >> I reverted above commit the panic not occurred during our testing. >> >> Any input will appreciate! > > Well, fact is that your crash is happening in skb_copy(). > > Frank patch is OK. I suspect using skb_clone() would work too, > so if these skb were fclone ready, chance of an GFP_ATOMIC allocation > error would be smaller. > > So something is providing a wrong skb at the very beginning. > > You could try to do a early skb_copy to catch the bug and see in the > stack trace what produced this buggy skb. > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 5c56b21..a7a51fd 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) > NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); > } > skb_dst_force(skb); > + kfree_skb(skb_copy(skb, GFP_ATOMIC)); > __skb_queue_tail(&neigh->arp_queue, skb); > neigh->arp_queue_len_bytes += skb->truesize; > } > > BUG: unable to handle kernel paging request at ffff8800488db8dc IP: [<ffffffff812605bb>] memcpy+0xb/0x120 PGD 1796067 PUD 20e5067 PMD 212a067 PTE 0 Oops: 0000 [#1] SMP CPU 13 Modules linked in: ocfs2 jbd2 xen_blkback xen_netback xen_gntdev xen_evtchn netconsole i2c_dev i2c_core ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs lockd sunrpc dm_round_robin dm_multipath bridge stp llc bonding be2iscsi iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc hed acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport serio_raw ixgbe hpilo tg3 hpwdt dca snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd iTCO_wdt iTCO_vendor_support soundcore snd_page_alloc pcspkr pata_acpi ata_generic dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ata_piix sg shpc hp hpsa cciss sd_mod crc_t10dif ext3 jbd mbcache Pid: 0, comm: swapper Not tainted 2.6.39-300.32.1.el5uek.bug16929255v5 #1 HP ProLiant DL360p Gen8 RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 RSP: e02b:ffff88005a9a3b68 EFLAGS: 00010202 RAX: ffff8800200f0280 RBX: 0000000000000724 RCX: 00000000000000e4 RDX: 0000000000000004 RSI: ffff8800488db8dc RDI: ffff8800200f0280 RBP: ffff88005a9a3bd0 R08: 0000000000000004 R09: ffff880052824980 R10: 0000000000000000 R11: 0000000000015048 R12: 0000000000000034 R13: 0000000000000034 R14: 00000000000022f4 R15: ffff880021208ab0 FS: 00007fe8737c96e0(0000) GS:ffff88005a9a0000(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b CR0: 000000008005003b CR2: ffff8800488db8dc CR3: 000000004fb38000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff880054d36000, task ffff880054d343c0) Stack: ffffffff8142dac7 0000000000000000 00000000ffffffff ffff8800200f0280 0000075800000000 0000000000000724 ffff880054d36000 0000000000000000 00000000fffffdb4 ffff880052824980 ffff880021208ab0 000000000000024c Call Trace: <IRQ> [<ffffffff8142dac7>] ? skb_copy_bits+0x167/0x290 [<ffffffff8142f0b5>] skb_copy+0x85/0xb0 [<ffffffff8144864d>] __neigh_event_send+0x18d/0x200 [<ffffffff81449a42>] neigh_resolve_output+0x162/0x1b0 [<ffffffff81477046>] ip_finish_output+0x146/0x320 [<ffffffff814754a5>] ip_output+0x85/0xd0 [<ffffffff814758d9>] ip_local_out+0x29/0x30 [<ffffffff814761e0>] ip_queue_xmit+0x1c0/0x3d0 [<ffffffff8148d3ef>] tcp_transmit_skb+0x40f/0x520 [<ffffffff8148e5ff>] tcp_retransmit_skb+0x16f/0x2e0 [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 [<ffffffff814905ad>] tcp_retransmit_timer+0x18d/0x4a0 [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 [<ffffffff81490994>] tcp_write_timer+0xd4/0x100 [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 [<ffffffff81511b7c>] call_softirq+0x1c/0x30 [<ffffffff810172e5>] do_softirq+0x65/0xa0 [<ffffffff8107656b>] irq_exit+0xab/0xc0 [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 [<ffffffff81511bce>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff8100a0d0>] ? xen_safe_halt+0x10/0x20 [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 [<ffffffff8100a8e9>] ? xen_irq_enable_direct_reloc+0x4/0x4 [<ffffffff814f7a2e>] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff812605bb>] memcpy+0xb/0x120 Per vmcore, the socket info as below: ------------------------------------------------------------------------------ <struct tcp_sock 0xffff88004d344e00> TCP tcp 10.1.1.11:42147 10.1.1.21:3260 FIN_WAIT1 windows: rcv=122124, snd=65535 advmss=8948 rcv_ws=1 snd_ws=0 nonagle=1 sack_ok=0 tstamp_ok=1 rmem_alloc=0, wmem_alloc=10229 rx_queue=0, tx_queue=149765 rcvbuf=262142, sndbuf=262142 rcv_tstamp=51.4 s, lsndtime=0.0 s ago -- Retransmissions -- retransmits=7, ca_state=TCP_CA_Disorder ------------------------------------------------------------------------------ When sock status move to FIN_WAIT1, will it cleanup all skb or no? Thanks, Joe ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 4:17 ` Joe Jin @ 2013-06-28 6:52 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-28 6:52 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On Fri, 2013-06-28 at 12:17 +0800, Joe Jin wrote: > Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611 > So copied to Xen developer as well. > > On 06/27/13 13:31, Eric Dumazet wrote: > > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: > >> Hi, > >> > >> When we do fail over test with iscsi + multipath by reset the switches > >> on OVM(2.6.39) we hit the panic: > >> > >> BUG: unable to handle kernel paging request at ffff88006d9e8d48 > >> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > >> Oops: 0000 [#1] SMP > >> CPU 7 > >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! > 3! > > j! > >> bd mbcache > >> > >> > >> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > >> RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > >> RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 > >> RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 > >> RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 > >> RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 > >> R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 > >> R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 > >> FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 > >> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > >> CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) > >> Stack: > >> ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 > >> 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 > >> ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c > >> Call Trace: > >> <IRQ> > >> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 > >> [<ffffffff8142f173>] skb_copy+0xf3/0x120 > >> [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 > >> [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 > >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > >> [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > >> [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > >> [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > >> [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > >> [<ffffffff81511d3c>] call_softirq+0x1c/0x30 > >> [<ffffffff810172e5>] do_softirq+0x65/0xa0 > >> [<ffffffff8107656b>] irq_exit+0xab/0xc0 > >> [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > >> [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 > >> <EOI> > >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > >> [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 > >> [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > >> [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > >> [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > >> [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 > >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > >> RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > >> RSP <ffff8801003c3d58> > >> CR2: ffff88006d9e8d48 > >> > >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour > >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: > >> > >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 > >> Author: Frank Blaschka <frank.blaschka@de.ibm.com> > >> Date: Mon Mar 3 12:16:04 2008 -0800 > >> > >> [NET]: Fix race in generic address resolution. > >> > >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler > >> has increased skbs refcount and calls solicit with the > >> skb. neigh_timer_handler should not increase skbs refcount but make a > >> copy of the skb and do solicit with the copy. > >> > >> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> > >> Signed-off-by: David S. Miller <davem@davemloft.net> > >> > >> So can you please give some details of the race? per vmcore seems like the skb data > >> be freed, I suspected skb_get() lost at somewhere? > >> I reverted above commit the panic not occurred during our testing. > >> > >> Any input will appreciate! > > > > Well, fact is that your crash is happening in skb_copy(). > > > > Frank patch is OK. I suspect using skb_clone() would work too, > > so if these skb were fclone ready, chance of an GFP_ATOMIC allocation > > error would be smaller. > > > > So something is providing a wrong skb at the very beginning. > > > > You could try to do a early skb_copy to catch the bug and see in the > > stack trace what produced this buggy skb. > > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > > index 5c56b21..a7a51fd 100644 > > --- a/net/core/neighbour.c > > +++ b/net/core/neighbour.c > > @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) > > NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); > > } > > skb_dst_force(skb); > > + kfree_skb(skb_copy(skb, GFP_ATOMIC)); > > __skb_queue_tail(&neigh->arp_queue, skb); > > neigh->arp_queue_len_bytes += skb->truesize; > > } > > > > > > BUG: unable to handle kernel paging request at ffff8800488db8dc > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > PGD 1796067 PUD 20e5067 PMD 212a067 PTE 0 > Oops: 0000 [#1] SMP > CPU 13 > Modules linked in: ocfs2 jbd2 xen_blkback xen_netback xen_gntdev xen_evtchn netconsole i2c_dev i2c_core ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs lockd sunrpc dm_round_robin dm_multipath bridge stp llc bonding be2iscsi iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc hed acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport serio_raw ixgbe hpilo tg3 hpwdt dca snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd iTCO_wdt iTCO_vendor_support soundcore snd_page_alloc pcspkr pata_acpi ata_generic dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ata_piix sg shpchp hpsa cciss sd_mod crc_t10dif ext3 jbd mbcache > > Pid: 0, comm: swapper Not tainted 2.6.39-300.32.1.el5uek.bug16929255v5 #1 HP ProLiant DL360p Gen8 > RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP: e02b:ffff88005a9a3b68 EFLAGS: 00010202 > RAX: ffff8800200f0280 RBX: 0000000000000724 RCX: 00000000000000e4 > RDX: 0000000000000004 RSI: ffff8800488db8dc RDI: ffff8800200f0280 > RBP: ffff88005a9a3bd0 R08: 0000000000000004 R09: ffff880052824980 > R10: 0000000000000000 R11: 0000000000015048 R12: 0000000000000034 > R13: 0000000000000034 R14: 00000000000022f4 R15: ffff880021208ab0 > FS: 00007fe8737c96e0(0000) GS:ffff88005a9a0000(0000) knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > CR2: ffff8800488db8dc CR3: 000000004fb38000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff880054d36000, task ffff880054d343c0) > Stack: > ffffffff8142dac7 0000000000000000 00000000ffffffff ffff8800200f0280 > 0000075800000000 0000000000000724 ffff880054d36000 0000000000000000 > 00000000fffffdb4 ffff880052824980 ffff880021208ab0 000000000000024c > Call Trace: > <IRQ> > [<ffffffff8142dac7>] ? skb_copy_bits+0x167/0x290 > [<ffffffff8142f0b5>] skb_copy+0x85/0xb0 > [<ffffffff8144864d>] __neigh_event_send+0x18d/0x200 > [<ffffffff81449a42>] neigh_resolve_output+0x162/0x1b0 > [<ffffffff81477046>] ip_finish_output+0x146/0x320 > [<ffffffff814754a5>] ip_output+0x85/0xd0 > [<ffffffff814758d9>] ip_local_out+0x29/0x30 > [<ffffffff814761e0>] ip_queue_xmit+0x1c0/0x3d0 > [<ffffffff8148d3ef>] tcp_transmit_skb+0x40f/0x520 > [<ffffffff8148e5ff>] tcp_retransmit_skb+0x16f/0x2e0 > [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 > [<ffffffff814905ad>] tcp_retransmit_timer+0x18d/0x4a0 > [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 > [<ffffffff81490994>] tcp_write_timer+0xd4/0x100 > [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 > [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > [<ffffffff81511b7c>] call_softirq+0x1c/0x30 > [<ffffffff810172e5>] do_softirq+0x65/0xa0 > [<ffffffff8107656b>] irq_exit+0xab/0xc0 > [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > [<ffffffff81511bce>] xen_do_hypervisor_callback+0x1e/0x30 > <EOI> > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0d0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > [<ffffffff8100a8e9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [<ffffffff814f7a2e>] ? cpu_bringup_and_idle+0xe/0x10 > Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > > > Per vmcore, the socket info as below: > ------------------------------------------------------------------------------ > <struct tcp_sock 0xffff88004d344e00> TCP > tcp 10.1.1.11:42147 10.1.1.21:3260 FIN_WAIT1 > windows: rcv=122124, snd=65535 advmss=8948 rcv_ws=1 snd_ws=0 > nonagle=1 sack_ok=0 tstamp_ok=1 > rmem_alloc=0, wmem_alloc=10229 > rx_queue=0, tx_queue=149765 > rcvbuf=262142, sndbuf=262142 > rcv_tstamp=51.4 s, lsndtime=0.0 s ago > -- Retransmissions -- > retransmits=7, ca_state=TCP_CA_Disorder > ------------------------------------------------------------------------------ > > When sock status move to FIN_WAIT1, will it cleanup all skb or no? I get crashes as well using UDP application. Its not related to TCP. There is some corruption going on in neighbour code. [ 942.319645] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 942.327510] IP: [<ffffffff814e4558>] __neigh_event_send+0x1a8/0x240 [ 942.333799] PGD c5a125067 PUD c603e1067 PMD 0 [ 942.338292] Oops: 0002 [#1] SMP [ 942.341819] gsmi: Log Shutdown Reason 0x03 [ 942.364995] CPU: 8 PID: 13760 Comm: netperf Tainted: G W 3.10.0-smp-DEV #155 [ 942.380212] task: ffff88065b54b000 ti: ffff8806498fc000 task.ti: ffff8806498fc000 [ 942.387689] RIP: 0010:[<ffffffff814e4558>] [<ffffffff814e4558>] __neigh_event_send+0x1a8/0x240 [ 942.396402] RSP: 0018:ffff8806498fd9d8 EFLAGS: 00010206 [ 942.401709] RAX: 0000000000000000 RBX: ffff88065a8f9000 RCX: ffff88065fdf61c0 [ 942.408837] RDX: 0000000000000000 RSI: ffff880c5d5b3080 RDI: ffff880c5b9c0ac0 [ 942.415966] RBP: ffff8806498fd9f8 R08: ffff88064cb00000 R09: ffff8806498fda70 [ 942.423095] R10: ffff880c5ffbead0 R11: ffffffff815137d0 R12: ffff88065a8f9030 [ 942.430232] R13: ffff880c5d5b3080 R14: 0000000000000000 R15: ffff88065b4af940 [ 942.437362] FS: 00007fd613190700(0000) GS:ffff880c7fc40000(0000) knlGS:0000000000000000 [ 942.445452] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 942.451193] CR2: 0000000000000008 CR3: 0000000c59b60000 CR4: 00000000000007e0 [ 942.458324] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 942.465460] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jun 27 05:49:12 [ 942.472597] Stack: [ 942.475997] ffff880c5d5b3080 ffff88065a8f9000 ffff880c59ac43c0 0000000000000088 [ 942.483473] ffff8806498fda48 ffffffff814e50db ffff880c5d5b3080 ffffffff81514c60 [ 942.490947] 0000000000000088 ffff88064cb00000 ffff880c5d5b3080 ffff880c59ac43c0 [ 942.498415] Call Trace: [ 942.500873] [<ffffffff814e50db>] neigh_resolve_output+0x14b/0x1f0 lpq84 kernel: [ [ 942.507056] [<ffffffff81514c60>] ? __ip_append_data.isra.39+0x9e0/0x9e0 [ 942.515138] [<ffffffff81514ddf>] ip_finish_output+0x17f/0x380 [ 942.520972] [<ffffffff81515bb3>] ip_output+0x53/0x90 942.341819] gsm[ 942.526030] [<ffffffff815167d6>] ? ip_make_skb+0xf6/0x120 [ 942.532897] [<ffffffff81515379>] ip_local_out+0x29/0x30 i: Log Shutdown [ 942.538215] [<ffffffff81516649>] ip_send_skb+0x19/0x50 Reason 0x03 [ 942.544825] [<ffffffff8153a65e>] udp_send_skb+0x2ce/0x3a0 [ 942.551439] [<ffffffff815137d0>] ? ip_setup_cork+0x110/0x110 ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits @ 2013-06-28 6:52 ` Eric Dumazet 0 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-28 6:52 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On Fri, 2013-06-28 at 12:17 +0800, Joe Jin wrote: > Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611 > So copied to Xen developer as well. > > On 06/27/13 13:31, Eric Dumazet wrote: > > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: > >> Hi, > >> > >> When we do fail over test with iscsi + multipath by reset the switches > >> on OVM(2.6.39) we hit the panic: > >> > >> BUG: unable to handle kernel paging request at ffff88006d9e8d48 > >> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > >> Oops: 0000 [#1] SMP > >> CPU 7 > >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! > 3! > > j! > >> bd mbcache > >> > >> > >> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > >> RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > >> RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 > >> RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 > >> RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 > >> RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 > >> R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 > >> R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 > >> FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 > >> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > >> CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) > >> Stack: > >> ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 > >> 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 > >> ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c > >> Call Trace: > >> <IRQ> > >> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 > >> [<ffffffff8142f173>] skb_copy+0xf3/0x120 > >> [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 > >> [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 > >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > >> [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > >> [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > >> [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > >> [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > >> [<ffffffff81511d3c>] call_softirq+0x1c/0x30 > >> [<ffffffff810172e5>] do_softirq+0x65/0xa0 > >> [<ffffffff8107656b>] irq_exit+0xab/0xc0 > >> [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > >> [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 > >> <EOI> > >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > >> [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 > >> [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > >> [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > >> [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > >> [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 > >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > >> RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > >> RSP <ffff8801003c3d58> > >> CR2: ffff88006d9e8d48 > >> > >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour > >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: > >> > >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 > >> Author: Frank Blaschka <frank.blaschka@de.ibm.com> > >> Date: Mon Mar 3 12:16:04 2008 -0800 > >> > >> [NET]: Fix race in generic address resolution. > >> > >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler > >> has increased skbs refcount and calls solicit with the > >> skb. neigh_timer_handler should not increase skbs refcount but make a > >> copy of the skb and do solicit with the copy. > >> > >> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> > >> Signed-off-by: David S. Miller <davem@davemloft.net> > >> > >> So can you please give some details of the race? per vmcore seems like the skb data > >> be freed, I suspected skb_get() lost at somewhere? > >> I reverted above commit the panic not occurred during our testing. > >> > >> Any input will appreciate! > > > > Well, fact is that your crash is happening in skb_copy(). > > > > Frank patch is OK. I suspect using skb_clone() would work too, > > so if these skb were fclone ready, chance of an GFP_ATOMIC allocation > > error would be smaller. > > > > So something is providing a wrong skb at the very beginning. > > > > You could try to do a early skb_copy to catch the bug and see in the > > stack trace what produced this buggy skb. > > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > > index 5c56b21..a7a51fd 100644 > > --- a/net/core/neighbour.c > > +++ b/net/core/neighbour.c > > @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) > > NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); > > } > > skb_dst_force(skb); > > + kfree_skb(skb_copy(skb, GFP_ATOMIC)); > > __skb_queue_tail(&neigh->arp_queue, skb); > > neigh->arp_queue_len_bytes += skb->truesize; > > } > > > > > > BUG: unable to handle kernel paging request at ffff8800488db8dc > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > PGD 1796067 PUD 20e5067 PMD 212a067 PTE 0 > Oops: 0000 [#1] SMP > CPU 13 > Modules linked in: ocfs2 jbd2 xen_blkback xen_netback xen_gntdev xen_evtchn netconsole i2c_dev i2c_core ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs lockd sunrpc dm_round_robin dm_multipath bridge stp llc bonding be2iscsi iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc hed acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport serio_raw ixgbe hpilo tg3 hpwdt dca snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd iTCO_wdt iTCO_vendor_support soundcore snd_page_alloc pcspkr pata_acpi ata_generic dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ata_piix sg sh pchp hpsa cciss sd_mod crc_t10dif ext3 jbd mbcache > > Pid: 0, comm: swapper Not tainted 2.6.39-300.32.1.el5uek.bug16929255v5 #1 HP ProLiant DL360p Gen8 > RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP: e02b:ffff88005a9a3b68 EFLAGS: 00010202 > RAX: ffff8800200f0280 RBX: 0000000000000724 RCX: 00000000000000e4 > RDX: 0000000000000004 RSI: ffff8800488db8dc RDI: ffff8800200f0280 > RBP: ffff88005a9a3bd0 R08: 0000000000000004 R09: ffff880052824980 > R10: 0000000000000000 R11: 0000000000015048 R12: 0000000000000034 > R13: 0000000000000034 R14: 00000000000022f4 R15: ffff880021208ab0 > FS: 00007fe8737c96e0(0000) GS:ffff88005a9a0000(0000) knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > CR2: ffff8800488db8dc CR3: 000000004fb38000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff880054d36000, task ffff880054d343c0) > Stack: > ffffffff8142dac7 0000000000000000 00000000ffffffff ffff8800200f0280 > 0000075800000000 0000000000000724 ffff880054d36000 0000000000000000 > 00000000fffffdb4 ffff880052824980 ffff880021208ab0 000000000000024c > Call Trace: > <IRQ> > [<ffffffff8142dac7>] ? skb_copy_bits+0x167/0x290 > [<ffffffff8142f0b5>] skb_copy+0x85/0xb0 > [<ffffffff8144864d>] __neigh_event_send+0x18d/0x200 > [<ffffffff81449a42>] neigh_resolve_output+0x162/0x1b0 > [<ffffffff81477046>] ip_finish_output+0x146/0x320 > [<ffffffff814754a5>] ip_output+0x85/0xd0 > [<ffffffff814758d9>] ip_local_out+0x29/0x30 > [<ffffffff814761e0>] ip_queue_xmit+0x1c0/0x3d0 > [<ffffffff8148d3ef>] tcp_transmit_skb+0x40f/0x520 > [<ffffffff8148e5ff>] tcp_retransmit_skb+0x16f/0x2e0 > [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 > [<ffffffff814905ad>] tcp_retransmit_timer+0x18d/0x4a0 > [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 > [<ffffffff81490994>] tcp_write_timer+0xd4/0x100 > [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 > [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > [<ffffffff81511b7c>] call_softirq+0x1c/0x30 > [<ffffffff810172e5>] do_softirq+0x65/0xa0 > [<ffffffff8107656b>] irq_exit+0xab/0xc0 > [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > [<ffffffff81511bce>] xen_do_hypervisor_callback+0x1e/0x30 > <EOI> > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0d0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > [<ffffffff8100a8e9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [<ffffffff814f7a2e>] ? cpu_bringup_and_idle+0xe/0x10 > Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > > > Per vmcore, the socket info as below: > ------------------------------------------------------------------------------ > <struct tcp_sock 0xffff88004d344e00> TCP > tcp 10.1.1.11:42147 10.1.1.21:3260 FIN_WAIT1 > windows: rcv=122124, snd=65535 advmss=8948 rcv_ws=1 snd_ws=0 > nonagle=1 sack_ok=0 tstamp_ok=1 > rmem_alloc=0, wmem_alloc=10229 > rx_queue=0, tx_queue=149765 > rcvbuf=262142, sndbuf=262142 > rcv_tstamp=51.4 s, lsndtime=0.0 s ago > -- Retransmissions -- > retransmits=7, ca_state=TCP_CA_Disorder > ------------------------------------------------------------------------------ > > When sock status move to FIN_WAIT1, will it cleanup all skb or no? I get crashes as well using UDP application. Its not related to TCP. There is some corruption going on in neighbour code. [ 942.319645] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 942.327510] IP: [<ffffffff814e4558>] __neigh_event_send+0x1a8/0x240 [ 942.333799] PGD c5a125067 PUD c603e1067 PMD 0 [ 942.338292] Oops: 0002 [#1] SMP [ 942.341819] gsmi: Log Shutdown Reason 0x03 [ 942.364995] CPU: 8 PID: 13760 Comm: netperf Tainted: G W 3.10.0-smp-DEV #155 [ 942.380212] task: ffff88065b54b000 ti: ffff8806498fc000 task.ti: ffff8806498fc000 [ 942.387689] RIP: 0010:[<ffffffff814e4558>] [<ffffffff814e4558>] __neigh_event_send+0x1a8/0x240 [ 942.396402] RSP: 0018:ffff8806498fd9d8 EFLAGS: 00010206 [ 942.401709] RAX: 0000000000000000 RBX: ffff88065a8f9000 RCX: ffff88065fdf61c0 [ 942.408837] RDX: 0000000000000000 RSI: ffff880c5d5b3080 RDI: ffff880c5b9c0ac0 [ 942.415966] RBP: ffff8806498fd9f8 R08: ffff88064cb00000 R09: ffff8806498fda70 [ 942.423095] R10: ffff880c5ffbead0 R11: ffffffff815137d0 R12: ffff88065a8f9030 [ 942.430232] R13: ffff880c5d5b3080 R14: 0000000000000000 R15: ffff88065b4af940 [ 942.437362] FS: 00007fd613190700(0000) GS:ffff880c7fc40000(0000) knlGS:0000000000000000 [ 942.445452] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 942.451193] CR2: 0000000000000008 CR3: 0000000c59b60000 CR4: 00000000000007e0 [ 942.458324] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 942.465460] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jun 27 05:49:12 [ 942.472597] Stack: [ 942.475997] ffff880c5d5b3080 ffff88065a8f9000 ffff880c59ac43c0 0000000000000088 [ 942.483473] ffff8806498fda48 ffffffff814e50db ffff880c5d5b3080 ffffffff81514c60 [ 942.490947] 0000000000000088 ffff88064cb00000 ffff880c5d5b3080 ffff880c59ac43c0 [ 942.498415] Call Trace: [ 942.500873] [<ffffffff814e50db>] neigh_resolve_output+0x14b/0x1f0 lpq84 kernel: [ [ 942.507056] [<ffffffff81514c60>] ? __ip_append_data.isra.39+0x9e0/0x9e0 [ 942.515138] [<ffffffff81514ddf>] ip_finish_output+0x17f/0x380 [ 942.520972] [<ffffffff81515bb3>] ip_output+0x53/0x90 942.341819] gsm[ 942.526030] [<ffffffff815167d6>] ? ip_make_skb+0xf6/0x120 [ 942.532897] [<ffffffff81515379>] ip_local_out+0x29/0x30 i: Log Shutdown [ 942.538215] [<ffffffff81516649>] ip_send_skb+0x19/0x50 Reason 0x03 [ 942.544825] [<ffffffff8153a65e>] udp_send_skb+0x2ce/0x3a0 [ 942.551439] [<ffffffff815137d0>] ? ip_setup_cork+0x110/0x110 ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 6:52 ` Eric Dumazet (?) @ 2013-06-28 9:37 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-28 9:37 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller OK please try the following patch [PATCH] neighbour: fix a race in neigh_destroy() There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh->lock in neigh_destroy() to close the race. Reported-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/core/neighbour.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 2569ab2..b7de821 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) we must kill timers etc. and move it to safe state. */ - skb_queue_purge(&n->arp_queue); + __skb_queue_purge(&n->arp_queue); n->arp_queue_len_bytes = 0; n->output = neigh_blackhole; if (n->nud_state & NUD_VALID) @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device if (!n) goto out_entries; - skb_queue_head_init(&n->arp_queue); + __skb_queue_head_init(&n->arp_queue); rwlock_init(&n->lock); seqlock_init(&n->ha_lock); n->updated = n->used = now; @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh) if (neigh_del_timer(neigh)) pr_warn("Impossible event\n"); - skb_queue_purge(&neigh->arp_queue); + write_lock_bh(&neigh->lock); + __skb_queue_purge(&neigh->arp_queue); + write_unlock_bh(&neigh->lock); neigh->arp_queue_len_bytes = 0; if (dev->netdev_ops->ndo_neigh_destroy) @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh) neigh->ops->error_report(neigh, skb); write_lock(&neigh->lock); } - skb_queue_purge(&neigh->arp_queue); + __skb_queue_purge(&neigh->arp_queue); neigh->arp_queue_len_bytes = 0; } @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, write_lock_bh(&neigh->lock); } - skb_queue_purge(&neigh->arp_queue); + __skb_queue_purge(&neigh->arp_queue); neigh->arp_queue_len_bytes = 0; } out: ^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 6:52 ` Eric Dumazet (?) (?) @ 2013-06-28 9:37 ` Eric Dumazet 2013-06-28 11:33 ` Joe Jin ` (5 more replies) -1 siblings, 6 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-28 9:37 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini OK please try the following patch [PATCH] neighbour: fix a race in neigh_destroy() There is a race in neighbour code, because neigh_destroy() uses skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, while other parts of the code assume neighbour rwlock is what protects arp_queue Convert all skb_queue_purge() calls to the __skb_queue_purge() variant Use __skb_queue_head_init() instead of skb_queue_head_init() to make clear we do not use arp_queue.lock And hold neigh->lock in neigh_destroy() to close the race. Reported-by: Joe Jin <joe.jin@oracle.com> Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/core/neighbour.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 2569ab2..b7de821 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) we must kill timers etc. and move it to safe state. */ - skb_queue_purge(&n->arp_queue); + __skb_queue_purge(&n->arp_queue); n->arp_queue_len_bytes = 0; n->output = neigh_blackhole; if (n->nud_state & NUD_VALID) @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device if (!n) goto out_entries; - skb_queue_head_init(&n->arp_queue); + __skb_queue_head_init(&n->arp_queue); rwlock_init(&n->lock); seqlock_init(&n->ha_lock); n->updated = n->used = now; @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh) if (neigh_del_timer(neigh)) pr_warn("Impossible event\n"); - skb_queue_purge(&neigh->arp_queue); + write_lock_bh(&neigh->lock); + __skb_queue_purge(&neigh->arp_queue); + write_unlock_bh(&neigh->lock); neigh->arp_queue_len_bytes = 0; if (dev->netdev_ops->ndo_neigh_destroy) @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh) neigh->ops->error_report(neigh, skb); write_lock(&neigh->lock); } - skb_queue_purge(&neigh->arp_queue); + __skb_queue_purge(&neigh->arp_queue); neigh->arp_queue_len_bytes = 0; } @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, write_lock_bh(&neigh->lock); } - skb_queue_purge(&neigh->arp_queue); + __skb_queue_purge(&neigh->arp_queue); neigh->arp_queue_len_bytes = 0; } out: ^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 9:37 ` Eric Dumazet @ 2013-06-28 11:33 ` Joe Jin 2013-06-28 11:33 ` Joe Jin ` (4 subsequent siblings) 5 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-28 11:33 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller Hi Eric, Thanks for your patch, I'll test it then get back to you. Regards, Joe On 06/28/13 17:37, Eric Dumazet wrote: > OK please try the following patch > > > [PATCH] neighbour: fix a race in neigh_destroy() > > There is a race in neighbour code, because neigh_destroy() uses > skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, > while other parts of the code assume neighbour rwlock is what > protects arp_queue > > Convert all skb_queue_purge() calls to the __skb_queue_purge() variant > > Use __skb_queue_head_init() instead of skb_queue_head_init() > to make clear we do not use arp_queue.lock > > And hold neigh->lock in neigh_destroy() to close the race. > > Reported-by: Joe Jin <joe.jin@oracle.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > net/core/neighbour.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 2569ab2..b7de821 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) > we must kill timers etc. and move > it to safe state. > */ > - skb_queue_purge(&n->arp_queue); > + __skb_queue_purge(&n->arp_queue); > n->arp_queue_len_bytes = 0; > n->output = neigh_blackhole; > if (n->nud_state & NUD_VALID) > @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device > if (!n) > goto out_entries; > > - skb_queue_head_init(&n->arp_queue); > + __skb_queue_head_init(&n->arp_queue); > rwlock_init(&n->lock); > seqlock_init(&n->ha_lock); > n->updated = n->used = now; > @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh) > if (neigh_del_timer(neigh)) > pr_warn("Impossible event\n"); > > - skb_queue_purge(&neigh->arp_queue); > + write_lock_bh(&neigh->lock); > + __skb_queue_purge(&neigh->arp_queue); > + write_unlock_bh(&neigh->lock); > neigh->arp_queue_len_bytes = 0; > > if (dev->netdev_ops->ndo_neigh_destroy) > @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh) > neigh->ops->error_report(neigh, skb); > write_lock(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > > @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, > > write_lock_bh(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > out: > > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 9:37 ` Eric Dumazet 2013-06-28 11:33 ` Joe Jin @ 2013-06-28 11:33 ` Joe Jin 2013-06-28 23:36 ` Joe Jin ` (3 subsequent siblings) 5 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-28 11:33 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini Hi Eric, Thanks for your patch, I'll test it then get back to you. Regards, Joe On 06/28/13 17:37, Eric Dumazet wrote: > OK please try the following patch > > > [PATCH] neighbour: fix a race in neigh_destroy() > > There is a race in neighbour code, because neigh_destroy() uses > skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, > while other parts of the code assume neighbour rwlock is what > protects arp_queue > > Convert all skb_queue_purge() calls to the __skb_queue_purge() variant > > Use __skb_queue_head_init() instead of skb_queue_head_init() > to make clear we do not use arp_queue.lock > > And hold neigh->lock in neigh_destroy() to close the race. > > Reported-by: Joe Jin <joe.jin@oracle.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > net/core/neighbour.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 2569ab2..b7de821 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) > we must kill timers etc. and move > it to safe state. > */ > - skb_queue_purge(&n->arp_queue); > + __skb_queue_purge(&n->arp_queue); > n->arp_queue_len_bytes = 0; > n->output = neigh_blackhole; > if (n->nud_state & NUD_VALID) > @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device > if (!n) > goto out_entries; > > - skb_queue_head_init(&n->arp_queue); > + __skb_queue_head_init(&n->arp_queue); > rwlock_init(&n->lock); > seqlock_init(&n->ha_lock); > n->updated = n->used = now; > @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh) > if (neigh_del_timer(neigh)) > pr_warn("Impossible event\n"); > > - skb_queue_purge(&neigh->arp_queue); > + write_lock_bh(&neigh->lock); > + __skb_queue_purge(&neigh->arp_queue); > + write_unlock_bh(&neigh->lock); > neigh->arp_queue_len_bytes = 0; > > if (dev->netdev_ops->ndo_neigh_destroy) > @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh) > neigh->ops->error_report(neigh, skb); > write_lock(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > > @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, > > write_lock_bh(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > out: > > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 9:37 ` Eric Dumazet @ 2013-06-28 23:36 ` Joe Jin 2013-06-28 11:33 ` Joe Jin ` (4 subsequent siblings) 5 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-28 23:36 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini Hi Eric, The patch not fix the issue and panic as same as early I posted: > BUG: unable to handle kernel paging request at ffff88006d9e8d48 > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > Oops: 0000 [#1] SMP > CPU 7 > Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3! jbd mbcac he > > > Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 > RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 > RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 > RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 > R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 > R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 > FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) > Stack: > ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 > 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 > ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c > Call Trace: > <IRQ> > [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 > [<ffffffff8142f173>] skb_copy+0xf3/0x120 > [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 > [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > [<ffffffff81511d3c>] call_softirq+0x1c/0x30 > [<ffffffff810172e5>] do_softirq+0x65/0xa0 > [<ffffffff8107656b>] irq_exit+0xab/0xc0 > [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 > <EOI> > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 > Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP <ffff8801003c3d58> > CR2: ffff88006d9e8d48 Thanks, Joe On 06/28/13 17:37, Eric Dumazet wrote: > OK please try the following patch > > > [PATCH] neighbour: fix a race in neigh_destroy() > > There is a race in neighbour code, because neigh_destroy() uses > skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, > while other parts of the code assume neighbour rwlock is what > protects arp_queue > > Convert all skb_queue_purge() calls to the __skb_queue_purge() variant > > Use __skb_queue_head_init() instead of skb_queue_head_init() > to make clear we do not use arp_queue.lock > > And hold neigh->lock in neigh_destroy() to close the race. > > Reported-by: Joe Jin <joe.jin@oracle.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > net/core/neighbour.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 2569ab2..b7de821 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) > we must kill timers etc. and move > it to safe state. > */ > - skb_queue_purge(&n->arp_queue); > + __skb_queue_purge(&n->arp_queue); > n->arp_queue_len_bytes = 0; > n->output = neigh_blackhole; > if (n->nud_state & NUD_VALID) > @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device > if (!n) > goto out_entries; > > - skb_queue_head_init(&n->arp_queue); > + __skb_queue_head_init(&n->arp_queue); > rwlock_init(&n->lock); > seqlock_init(&n->ha_lock); > n->updated = n->used = now; > @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh) > if (neigh_del_timer(neigh)) > pr_warn("Impossible event\n"); > > - skb_queue_purge(&neigh->arp_queue); > + write_lock_bh(&neigh->lock); > + __skb_queue_purge(&neigh->arp_queue); > + write_unlock_bh(&neigh->lock); > neigh->arp_queue_len_bytes = 0; > > if (dev->netdev_ops->ndo_neigh_destroy) > @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh) > neigh->ops->error_report(neigh, skb); > write_lock(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > > @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, > > write_lock_bh(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > out: > > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits @ 2013-06-28 23:36 ` Joe Jin 0 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-28 23:36 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini Hi Eric, The patch not fix the issue and panic as same as early I posted: > BUG: unable to handle kernel paging request at ffff88006d9e8d48 > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > Oops: 0000 [#1] SMP > CPU 7 > Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core he d dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3! jbd mbcac he > > > Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 > RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 > RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 > RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 > R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 > R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 > FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) > Stack: > ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 > 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 > ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c > Call Trace: > <IRQ> > [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 > [<ffffffff8142f173>] skb_copy+0xf3/0x120 > [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 > [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > [<ffffffff81511d3c>] call_softirq+0x1c/0x30 > [<ffffffff810172e5>] do_softirq+0x65/0xa0 > [<ffffffff8107656b>] irq_exit+0xab/0xc0 > [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 > <EOI> > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 > Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP <ffff8801003c3d58> > CR2: ffff88006d9e8d48 Thanks, Joe On 06/28/13 17:37, Eric Dumazet wrote: > OK please try the following patch > > > [PATCH] neighbour: fix a race in neigh_destroy() > > There is a race in neighbour code, because neigh_destroy() uses > skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, > while other parts of the code assume neighbour rwlock is what > protects arp_queue > > Convert all skb_queue_purge() calls to the __skb_queue_purge() variant > > Use __skb_queue_head_init() instead of skb_queue_head_init() > to make clear we do not use arp_queue.lock > > And hold neigh->lock in neigh_destroy() to close the race. > > Reported-by: Joe Jin <joe.jin@oracle.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > net/core/neighbour.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 2569ab2..b7de821 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) > we must kill timers etc. and move > it to safe state. > */ > - skb_queue_purge(&n->arp_queue); > + __skb_queue_purge(&n->arp_queue); > n->arp_queue_len_bytes = 0; > n->output = neigh_blackhole; > if (n->nud_state & NUD_VALID) > @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device > if (!n) > goto out_entries; > > - skb_queue_head_init(&n->arp_queue); > + __skb_queue_head_init(&n->arp_queue); > rwlock_init(&n->lock); > seqlock_init(&n->ha_lock); > n->updated = n->used = now; > @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh) > if (neigh_del_timer(neigh)) > pr_warn("Impossible event\n"); > > - skb_queue_purge(&neigh->arp_queue); > + write_lock_bh(&neigh->lock); > + __skb_queue_purge(&neigh->arp_queue); > + write_unlock_bh(&neigh->lock); > neigh->arp_queue_len_bytes = 0; > > if (dev->netdev_ops->ndo_neigh_destroy) > @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh) > neigh->ops->error_report(neigh, skb); > write_lock(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > > @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, > > write_lock_bh(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > out: > > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 23:36 ` Joe Jin (?) @ 2013-06-29 7:04 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-29 7:04 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: > Hi Eric, > > The patch not fix the issue and panic as same as early I posted: At least it fixes my own panics ;) My test bed was : Launch 24 concurrent "netperf -t UDP_STREAM -H destination -- -m 128" Then on "destination" disconnect the ethernet port. While the link flaps, I got panic in a few seconds. Thanks ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 23:36 ` Joe Jin (?) (?) @ 2013-06-29 7:04 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-29 7:04 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: > Hi Eric, > > The patch not fix the issue and panic as same as early I posted: At least it fixes my own panics ;) My test bed was : Launch 24 concurrent "netperf -t UDP_STREAM -H destination -- -m 128" Then on "destination" disconnect the ethernet port. While the link flaps, I got panic in a few seconds. Thanks ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 23:36 ` Joe Jin ` (2 preceding siblings ...) (?) @ 2013-06-29 7:20 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-29 7:20 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: > Hi Eric, > > The patch not fix the issue and panic as same as early I posted: > > BUG: unable to handle kernel paging request at ffff88006d9e8d48 > > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > > Oops: 0000 [#1] SMP > > CPU 7 > > Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3! > jbd mbcac > he > > > > > > Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 By the way my patch was for current kernels, not for 2.6.39 For instance, I was not able to reproduce the crash with 3.3 RCU in neighbour code was added in 2.6.37, but it looks like this code is a bit fragile because all the kfree_skb() are done while neighbour locks are held. So if a skb destructor triggers a new call to neighbour code, I presume some bad things can happen. LOCKDEP could eventually help to detect this. You could try to replace these kfree_skb() calls to dev_kfree_skb_irq() just in case. (Do not forget the __skb_queue_purge() ones) Try a LOCKDEP build as well. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 23:36 ` Joe Jin @ 2013-06-29 7:20 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-29 7:20 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: > Hi Eric, > > The patch not fix the issue and panic as same as early I posted: > > BUG: unable to handle kernel paging request at ffff88006d9e8d48 > > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > > Oops: 0000 [#1] SMP > > CPU 7 > > Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3! > jbd mbcac > he > > > > > > Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 By the way my patch was for current kernels, not for 2.6.39 For instance, I was not able to reproduce the crash with 3.3 RCU in neighbour code was added in 2.6.37, but it looks like this code is a bit fragile because all the kfree_skb() are done while neighbour locks are held. So if a skb destructor triggers a new call to neighbour code, I presume some bad things can happen. LOCKDEP could eventually help to detect this. You could try to replace these kfree_skb() calls to dev_kfree_skb_irq() just in case. (Do not forget the __skb_queue_purge() ones) Try a LOCKDEP build as well. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits @ 2013-06-29 7:20 ` Eric Dumazet 0 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-29 7:20 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: > Hi Eric, > > The patch not fix the issue and panic as same as early I posted: > > BUG: unable to handle kernel paging request at ffff88006d9e8d48 > > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > > Oops: 0000 [#1] SMP > > CPU 7 > > Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3! > jbd mbcac > he > > > > > > Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 By the way my patch was for current kernels, not for 2.6.39 For instance, I was not able to reproduce the crash with 3.3 RCU in neighbour code was added in 2.6.37, but it looks like this code is a bit fragile because all the kfree_skb() are done while neighbour locks are held. So if a skb destructor triggers a new call to neighbour code, I presume some bad things can happen. LOCKDEP could eventually help to detect this. You could try to replace these kfree_skb() calls to dev_kfree_skb_irq() just in case. (Do not forget the __skb_queue_purge() ones) Try a LOCKDEP build as well. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-29 7:20 ` Eric Dumazet (?) @ 2013-06-29 16:11 ` Ben Greear -1 siblings, 0 replies; 64+ messages in thread From: Ben Greear @ 2013-06-29 16:11 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, Joe Jin, linux-kernel, Xen Devel, Jan Beulich, David S. Miller On 06/29/2013 12:20 AM, Eric Dumazet wrote: > On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: >> Hi Eric, >> >> The patch not fix the issue and panic as same as early I posted: >>> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >>> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >>> Oops: 0000 [#1] SMP >>> CPU 7 >>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ex! t3! >> jbd mbcac >> he >>> >>> >>> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > > > By the way my patch was for current kernels, not for 2.6.39 Do you know if your patch should go in 3.9? Your test case sounds a bit like what gives us the rare crash in tcp_collapse (we have lots of bouncing wifi interfaces running slow-speed TCP trafic). But, it takes days for us to hit the problem most of the time. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-29 7:20 ` Eric Dumazet @ 2013-06-29 16:11 ` Ben Greear -1 siblings, 0 replies; 64+ messages in thread From: Ben Greear @ 2013-06-29 16:11 UTC (permalink / raw) To: Eric Dumazet Cc: Joe Jin, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On 06/29/2013 12:20 AM, Eric Dumazet wrote: > On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: >> Hi Eric, >> >> The patch not fix the issue and panic as same as early I posted: >>> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >>> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >>> Oops: 0000 [#1] SMP >>> CPU 7 >>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ex! t3! >> jbd mbcac >> he >>> >>> >>> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > > > By the way my patch was for current kernels, not for 2.6.39 Do you know if your patch should go in 3.9? Your test case sounds a bit like what gives us the rare crash in tcp_collapse (we have lots of bouncing wifi interfaces running slow-speed TCP trafic). But, it takes days for us to hit the problem most of the time. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits @ 2013-06-29 16:11 ` Ben Greear 0 siblings, 0 replies; 64+ messages in thread From: Ben Greear @ 2013-06-29 16:11 UTC (permalink / raw) To: Eric Dumazet Cc: Joe Jin, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On 06/29/2013 12:20 AM, Eric Dumazet wrote: > On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: >> Hi Eric, >> >> The patch not fix the issue and panic as same as early I posted: >>> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >>> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >>> Oops: 0000 [#1] SMP >>> CPU 7 >>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ex! t3! >> jbd mbcac >> he >>> >>> >>> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > > > By the way my patch was for current kernels, not for 2.6.39 Do you know if your patch should go in 3.9? Your test case sounds a bit like what gives us the rare crash in tcp_collapse (we have lots of bouncing wifi interfaces running slow-speed TCP trafic). But, it takes days for us to hit the problem most of the time. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-29 16:11 ` Ben Greear (?) @ 2013-06-29 16:26 ` Eric Dumazet 2013-06-29 16:31 ` Ben Greear 2013-06-29 16:31 ` Ben Greear -1 siblings, 2 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-29 16:26 UTC (permalink / raw) To: Ben Greear Cc: Joe Jin, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On Sat, 2013-06-29 at 09:11 -0700, Ben Greear wrote: > Do you know if your patch should go in 3.9? > Yes it should. > Your test case sounds a bit like what gives us the rare crash in tcp_collapse > (we have lots of bouncing wifi interfaces running slow-speed TCP trafic). But, > it takes days for us to hit the problem most of the time. Well, unfortunately that's a different problem :( ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-29 16:26 ` Eric Dumazet @ 2013-06-29 16:31 ` Ben Greear 2013-06-29 16:31 ` Ben Greear 1 sibling, 0 replies; 64+ messages in thread From: Ben Greear @ 2013-06-29 16:31 UTC (permalink / raw) To: Eric Dumazet Cc: Joe Jin, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On 06/29/2013 09:26 AM, Eric Dumazet wrote: > On Sat, 2013-06-29 at 09:11 -0700, Ben Greear wrote: > >> Do you know if your patch should go in 3.9? >> > > Yes it should. Ok, I'll add that to my tree. >> Your test case sounds a bit like what gives us the rare crash in tcp_collapse >> (we have lots of bouncing wifi interfaces running slow-speed TCP trafic). But, >> it takes days for us to hit the problem most of the time. > > Well, unfortunately that's a different problem :( For what it's worth, I added this patch to my tree. We haven't hit the problem since, but perhaps on the over-the-weekend run we'll see it. commit 0286716b36a0e5b82c385052a0971f44bc3c3442 Author: Ben Greear <greearb@candelatech.com> Date: Tue Jun 25 15:49:52 2013 -0700 tcp: Try to work around crash in tcp_collapse. And print out some info about why it crashed. Signed-off-by: Ben Greear <greearb@candelatech.com> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a2f267a..63f7704 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4810,7 +4810,15 @@ restart: int offset = start - TCP_SKB_CB(skb)->seq; int size = TCP_SKB_CB(skb)->end_seq - start; - BUG_ON(offset < 0); + if (WARN_ON(offset < 0)) { + /* We see a crash here (when using BUG_ON) every few days under + * some torture tests. I'm not sure how to clean this up properly, + * so just return and hope thinks keep muddling through. --Ben + */ + printk("offset: %i start: %i seq: %i size: %i copy: %i\n", + offset, start, TCP_SKB_CB(skb)->seq, size, copy); + return; + } if (size > 0) { size = min(copy, size); if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-29 16:26 ` Eric Dumazet 2013-06-29 16:31 ` Ben Greear @ 2013-06-29 16:31 ` Ben Greear 1 sibling, 0 replies; 64+ messages in thread From: Ben Greear @ 2013-06-29 16:31 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, Joe Jin, linux-kernel, Xen Devel, Jan Beulich, David S. Miller On 06/29/2013 09:26 AM, Eric Dumazet wrote: > On Sat, 2013-06-29 at 09:11 -0700, Ben Greear wrote: > >> Do you know if your patch should go in 3.9? >> > > Yes it should. Ok, I'll add that to my tree. >> Your test case sounds a bit like what gives us the rare crash in tcp_collapse >> (we have lots of bouncing wifi interfaces running slow-speed TCP trafic). But, >> it takes days for us to hit the problem most of the time. > > Well, unfortunately that's a different problem :( For what it's worth, I added this patch to my tree. We haven't hit the problem since, but perhaps on the over-the-weekend run we'll see it. commit 0286716b36a0e5b82c385052a0971f44bc3c3442 Author: Ben Greear <greearb@candelatech.com> Date: Tue Jun 25 15:49:52 2013 -0700 tcp: Try to work around crash in tcp_collapse. And print out some info about why it crashed. Signed-off-by: Ben Greear <greearb@candelatech.com> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a2f267a..63f7704 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4810,7 +4810,15 @@ restart: int offset = start - TCP_SKB_CB(skb)->seq; int size = TCP_SKB_CB(skb)->end_seq - start; - BUG_ON(offset < 0); + if (WARN_ON(offset < 0)) { + /* We see a crash here (when using BUG_ON) every few days under + * some torture tests. I'm not sure how to clean this up properly, + * so just return and hope thinks keep muddling through. --Ben + */ + printk("offset: %i start: %i seq: %i size: %i copy: %i\n", + offset, start, TCP_SKB_CB(skb)->seq, size, copy); + return; + } if (size > 0) { size = min(copy, size); if (skb_copy_bits(skb, offset, skb_put(nskb, size), size)) Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply related [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-29 16:11 ` Ben Greear (?) (?) @ 2013-06-29 16:26 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-29 16:26 UTC (permalink / raw) To: Ben Greear Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, Joe Jin, linux-kernel, Xen Devel, Jan Beulich, David S. Miller On Sat, 2013-06-29 at 09:11 -0700, Ben Greear wrote: > Do you know if your patch should go in 3.9? > Yes it should. > Your test case sounds a bit like what gives us the rare crash in tcp_collapse > (we have lots of bouncing wifi interfaces running slow-speed TCP trafic). But, > it takes days for us to hit the problem most of the time. Well, unfortunately that's a different problem :( ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-29 7:20 ` Eric Dumazet @ 2013-06-30 0:26 ` Joe Jin -1 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-30 0:26 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On 06/29/13 15:20, Eric Dumazet wrote: > On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: >> Hi Eric, >> >> The patch not fix the issue and panic as same as early I posted: >>> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >>> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >>> Oops: 0000 [#1] SMP >>> CPU 7 >>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ex! > t3! >> jbd mbcac >> he >>> >>> >>> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > > > By the way my patch was for current kernels, not for 2.6.39 > > For instance, I was not able to reproduce the crash with 3.3 > > RCU in neighbour code was added in 2.6.37, but it looks like this code > is a bit fragile because all the kfree_skb() are done while neighbour > locks are held. > > So if a skb destructor triggers a new call to neighbour code, I presume > some bad things can happen. LOCKDEP could eventually help to detect > this. > > You could try to replace these kfree_skb() calls to dev_kfree_skb_irq() > just in case. > > (Do not forget the __skb_queue_purge() ones) > > Try a LOCKDEP build as well. So far we suspected it caused by iscsi called sendpage(), and later page be unmapped but still trying copy skb. We'll try to disable sg to see if help or no. Thanks, Joe ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits @ 2013-06-30 0:26 ` Joe Jin 0 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-30 0:26 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On 06/29/13 15:20, Eric Dumazet wrote: > On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: >> Hi Eric, >> >> The patch not fix the issue and panic as same as early I posted: >>> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >>> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >>> Oops: 0000 [#1] SMP >>> CPU 7 >>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ex! > t3! >> jbd mbcac >> he >>> >>> >>> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > > > By the way my patch was for current kernels, not for 2.6.39 > > For instance, I was not able to reproduce the crash with 3.3 > > RCU in neighbour code was added in 2.6.37, but it looks like this code > is a bit fragile because all the kfree_skb() are done while neighbour > locks are held. > > So if a skb destructor triggers a new call to neighbour code, I presume > some bad things can happen. LOCKDEP could eventually help to detect > this. > > You could try to replace these kfree_skb() calls to dev_kfree_skb_irq() > just in case. > > (Do not forget the __skb_queue_purge() ones) > > Try a LOCKDEP build as well. So far we suspected it caused by iscsi called sendpage(), and later page be unmapped but still trying copy skb. We'll try to disable sg to see if help or no. Thanks, Joe ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-30 0:26 ` Joe Jin (?) @ 2013-06-30 7:50 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-30 7:50 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller On Sun, 2013-06-30 at 08:26 +0800, Joe Jin wrote: > So far we suspected it caused by iscsi called sendpage(), and later page > be unmapped but still trying copy skb. We'll try to disable sg to see if > help or no. sendpage() should increment page refcounts for every page frag of an skb, therefore page should not be unmapped. Of course userland can either rewrite the content, or unmap() the page, but the underlying page cannot be freed as long skb is not freed. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-30 0:26 ` Joe Jin (?) (?) @ 2013-06-30 7:50 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-30 7:50 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On Sun, 2013-06-30 at 08:26 +0800, Joe Jin wrote: > So far we suspected it caused by iscsi called sendpage(), and later page > be unmapped but still trying copy skb. We'll try to disable sg to see if > help or no. sendpage() should increment page refcounts for every page frag of an skb, therefore page should not be unmapped. Of course userland can either rewrite the content, or unmap() the page, but the underlying page cannot be freed as long skb is not freed. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-29 7:20 ` Eric Dumazet ` (3 preceding siblings ...) (?) @ 2013-06-30 0:26 ` Joe Jin -1 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-30 0:26 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller On 06/29/13 15:20, Eric Dumazet wrote: > On Sat, 2013-06-29 at 07:36 +0800, Joe Jin wrote: >> Hi Eric, >> >> The patch not fix the issue and panic as same as early I posted: >>> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >>> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >>> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >>> Oops: 0000 [#1] SMP >>> CPU 7 >>> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ex! > t3! >> jbd mbcac >> he >>> >>> >>> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > > > By the way my patch was for current kernels, not for 2.6.39 > > For instance, I was not able to reproduce the crash with 3.3 > > RCU in neighbour code was added in 2.6.37, but it looks like this code > is a bit fragile because all the kfree_skb() are done while neighbour > locks are held. > > So if a skb destructor triggers a new call to neighbour code, I presume > some bad things can happen. LOCKDEP could eventually help to detect > this. > > You could try to replace these kfree_skb() calls to dev_kfree_skb_irq() > just in case. > > (Do not forget the __skb_queue_purge() ones) > > Try a LOCKDEP build as well. So far we suspected it caused by iscsi called sendpage(), and later page be unmapped but still trying copy skb. We'll try to disable sg to see if help or no. Thanks, Joe ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 9:37 ` Eric Dumazet ` (2 preceding siblings ...) 2013-06-28 23:36 ` Joe Jin @ 2013-06-28 23:36 ` Joe Jin 2013-07-01 20:36 ` David Miller 2013-07-01 20:36 ` David Miller 5 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-28 23:36 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller Hi Eric, The patch not fix the issue and panic as same as early I posted: > BUG: unable to handle kernel paging request at ffff88006d9e8d48 > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > Oops: 0000 [#1] SMP > CPU 7 > Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core he d dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext3! jbd mbcac he > > > Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 > RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 > RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 > RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 > R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 > R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 > FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) > Stack: > ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 > 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 > ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c > Call Trace: > <IRQ> > [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 > [<ffffffff8142f173>] skb_copy+0xf3/0x120 > [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 > [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > [<ffffffff81511d3c>] call_softirq+0x1c/0x30 > [<ffffffff810172e5>] do_softirq+0x65/0xa0 > [<ffffffff8107656b>] irq_exit+0xab/0xc0 > [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 > <EOI> > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 > Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP <ffff8801003c3d58> > CR2: ffff88006d9e8d48 Thanks, Joe On 06/28/13 17:37, Eric Dumazet wrote: > OK please try the following patch > > > [PATCH] neighbour: fix a race in neigh_destroy() > > There is a race in neighbour code, because neigh_destroy() uses > skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, > while other parts of the code assume neighbour rwlock is what > protects arp_queue > > Convert all skb_queue_purge() calls to the __skb_queue_purge() variant > > Use __skb_queue_head_init() instead of skb_queue_head_init() > to make clear we do not use arp_queue.lock > > And hold neigh->lock in neigh_destroy() to close the race. > > Reported-by: Joe Jin <joe.jin@oracle.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > net/core/neighbour.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 2569ab2..b7de821 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -231,7 +231,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) > we must kill timers etc. and move > it to safe state. > */ > - skb_queue_purge(&n->arp_queue); > + __skb_queue_purge(&n->arp_queue); > n->arp_queue_len_bytes = 0; > n->output = neigh_blackhole; > if (n->nud_state & NUD_VALID) > @@ -286,7 +286,7 @@ static struct neighbour *neigh_alloc(struct neigh_table *tbl, struct net_device > if (!n) > goto out_entries; > > - skb_queue_head_init(&n->arp_queue); > + __skb_queue_head_init(&n->arp_queue); > rwlock_init(&n->lock); > seqlock_init(&n->ha_lock); > n->updated = n->used = now; > @@ -708,7 +708,9 @@ void neigh_destroy(struct neighbour *neigh) > if (neigh_del_timer(neigh)) > pr_warn("Impossible event\n"); > > - skb_queue_purge(&neigh->arp_queue); > + write_lock_bh(&neigh->lock); > + __skb_queue_purge(&neigh->arp_queue); > + write_unlock_bh(&neigh->lock); > neigh->arp_queue_len_bytes = 0; > > if (dev->netdev_ops->ndo_neigh_destroy) > @@ -858,7 +860,7 @@ static void neigh_invalidate(struct neighbour *neigh) > neigh->ops->error_report(neigh, skb); > write_lock(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > > @@ -1210,7 +1212,7 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, > > write_lock_bh(&neigh->lock); > } > - skb_queue_purge(&neigh->arp_queue); > + __skb_queue_purge(&neigh->arp_queue); > neigh->arp_queue_len_bytes = 0; > } > out: > > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 9:37 ` Eric Dumazet ` (3 preceding siblings ...) 2013-06-28 23:36 ` Joe Jin @ 2013-07-01 20:36 ` David Miller 2013-07-01 20:36 ` David Miller 5 siblings, 0 replies; 64+ messages in thread From: David Miller @ 2013-07-01 20:36 UTC (permalink / raw) To: eric.dumazet Cc: frank.blaschka, zheng.x.li, Ian.Campbell, stefano.stabellini, netdev, joe.jin, linux-kernel, xen-devel, JBeulich From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 28 Jun 2013 02:37:42 -0700 > [PATCH] neighbour: fix a race in neigh_destroy() > > There is a race in neighbour code, because neigh_destroy() uses > skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, > while other parts of the code assume neighbour rwlock is what > protects arp_queue > > Convert all skb_queue_purge() calls to the __skb_queue_purge() variant > > Use __skb_queue_head_init() instead of skb_queue_head_init() > to make clear we do not use arp_queue.lock > > And hold neigh->lock in neigh_destroy() to close the race. > > Reported-by: Joe Jin <joe.jin@oracle.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied and queued up for -stable, thanks Eric. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 9:37 ` Eric Dumazet ` (4 preceding siblings ...) 2013-07-01 20:36 ` David Miller @ 2013-07-01 20:36 ` David Miller 5 siblings, 0 replies; 64+ messages in thread From: David Miller @ 2013-07-01 20:36 UTC (permalink / raw) To: eric.dumazet Cc: joe.jin, frank.blaschka, linux-kernel, netdev, zheng.x.li, xen-devel, Ian.Campbell, JBeulich, stefano.stabellini From: Eric Dumazet <eric.dumazet@gmail.com> Date: Fri, 28 Jun 2013 02:37:42 -0700 > [PATCH] neighbour: fix a race in neigh_destroy() > > There is a race in neighbour code, because neigh_destroy() uses > skb_queue_purge(&neigh->arp_queue) without holding neighbour lock, > while other parts of the code assume neighbour rwlock is what > protects arp_queue > > Convert all skb_queue_purge() calls to the __skb_queue_purge() variant > > Use __skb_queue_head_init() instead of skb_queue_head_init() > to make clear we do not use arp_queue.lock > > And hold neigh->lock in neigh_destroy() to close the race. > > Reported-by: Joe Jin <joe.jin@oracle.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> Applied and queued up for -stable, thanks Eric. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 4:17 ` Joe Jin (?) (?) @ 2013-06-28 6:52 ` Eric Dumazet -1 siblings, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-06-28 6:52 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller On Fri, 2013-06-28 at 12:17 +0800, Joe Jin wrote: > Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611 > So copied to Xen developer as well. > > On 06/27/13 13:31, Eric Dumazet wrote: > > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: > >> Hi, > >> > >> When we do fail over test with iscsi + multipath by reset the switches > >> on OVM(2.6.39) we hit the panic: > >> > >> BUG: unable to handle kernel paging request at ffff88006d9e8d48 > >> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 > >> Oops: 0000 [#1] SMP > >> CPU 7 > >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core hed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! > 3! > > j! > >> bd mbcache > >> > >> > >> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 > >> RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > >> RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 > >> RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 > >> RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 > >> RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 > >> R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 > >> R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 > >> FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 > >> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > >> CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > >> Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) > >> Stack: > >> ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 > >> 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 > >> ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c > >> Call Trace: > >> <IRQ> > >> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 > >> [<ffffffff8142f173>] skb_copy+0xf3/0x120 > >> [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 > >> [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 > >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > >> [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 > >> [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > >> [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > >> [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > >> [<ffffffff81511d3c>] call_softirq+0x1c/0x30 > >> [<ffffffff810172e5>] do_softirq+0x65/0xa0 > >> [<ffffffff8107656b>] irq_exit+0xab/0xc0 > >> [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > >> [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 > >> <EOI> > >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > >> [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 > >> [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > >> [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > >> [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > >> [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 > >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > >> RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > >> RSP <ffff8801003c3d58> > >> CR2: ffff88006d9e8d48 > >> > >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour > >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: > >> > >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 > >> Author: Frank Blaschka <frank.blaschka@de.ibm.com> > >> Date: Mon Mar 3 12:16:04 2008 -0800 > >> > >> [NET]: Fix race in generic address resolution. > >> > >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler > >> has increased skbs refcount and calls solicit with the > >> skb. neigh_timer_handler should not increase skbs refcount but make a > >> copy of the skb and do solicit with the copy. > >> > >> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> > >> Signed-off-by: David S. Miller <davem@davemloft.net> > >> > >> So can you please give some details of the race? per vmcore seems like the skb data > >> be freed, I suspected skb_get() lost at somewhere? > >> I reverted above commit the panic not occurred during our testing. > >> > >> Any input will appreciate! > > > > Well, fact is that your crash is happening in skb_copy(). > > > > Frank patch is OK. I suspect using skb_clone() would work too, > > so if these skb were fclone ready, chance of an GFP_ATOMIC allocation > > error would be smaller. > > > > So something is providing a wrong skb at the very beginning. > > > > You could try to do a early skb_copy to catch the bug and see in the > > stack trace what produced this buggy skb. > > > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > > index 5c56b21..a7a51fd 100644 > > --- a/net/core/neighbour.c > > +++ b/net/core/neighbour.c > > @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) > > NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); > > } > > skb_dst_force(skb); > > + kfree_skb(skb_copy(skb, GFP_ATOMIC)); > > __skb_queue_tail(&neigh->arp_queue, skb); > > neigh->arp_queue_len_bytes += skb->truesize; > > } > > > > > > BUG: unable to handle kernel paging request at ffff8800488db8dc > IP: [<ffffffff812605bb>] memcpy+0xb/0x120 > PGD 1796067 PUD 20e5067 PMD 212a067 PTE 0 > Oops: 0000 [#1] SMP > CPU 13 > Modules linked in: ocfs2 jbd2 xen_blkback xen_netback xen_gntdev xen_evtchn netconsole i2c_dev i2c_core ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs lockd sunrpc dm_round_robin dm_multipath bridge stp llc bonding be2iscsi iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc hed acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport serio_raw ixgbe hpilo tg3 hpwdt dca snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd iTCO_wdt iTCO_vendor_support soundcore snd_page_alloc pcspkr pata_acpi ata_generic dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ata_piix sg sh pchp hpsa cciss sd_mod crc_t10dif ext3 jbd mbcache > > Pid: 0, comm: swapper Not tainted 2.6.39-300.32.1.el5uek.bug16929255v5 #1 HP ProLiant DL360p Gen8 > RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 > RSP: e02b:ffff88005a9a3b68 EFLAGS: 00010202 > RAX: ffff8800200f0280 RBX: 0000000000000724 RCX: 00000000000000e4 > RDX: 0000000000000004 RSI: ffff8800488db8dc RDI: ffff8800200f0280 > RBP: ffff88005a9a3bd0 R08: 0000000000000004 R09: ffff880052824980 > R10: 0000000000000000 R11: 0000000000015048 R12: 0000000000000034 > R13: 0000000000000034 R14: 00000000000022f4 R15: ffff880021208ab0 > FS: 00007fe8737c96e0(0000) GS:ffff88005a9a0000(0000) knlGS:0000000000000000 > CS: e033 DS: 002b ES: 002b CR0: 000000008005003b > CR2: ffff8800488db8dc CR3: 000000004fb38000 CR4: 0000000000002660 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff880054d36000, task ffff880054d343c0) > Stack: > ffffffff8142dac7 0000000000000000 00000000ffffffff ffff8800200f0280 > 0000075800000000 0000000000000724 ffff880054d36000 0000000000000000 > 00000000fffffdb4 ffff880052824980 ffff880021208ab0 000000000000024c > Call Trace: > <IRQ> > [<ffffffff8142dac7>] ? skb_copy_bits+0x167/0x290 > [<ffffffff8142f0b5>] skb_copy+0x85/0xb0 > [<ffffffff8144864d>] __neigh_event_send+0x18d/0x200 > [<ffffffff81449a42>] neigh_resolve_output+0x162/0x1b0 > [<ffffffff81477046>] ip_finish_output+0x146/0x320 > [<ffffffff814754a5>] ip_output+0x85/0xd0 > [<ffffffff814758d9>] ip_local_out+0x29/0x30 > [<ffffffff814761e0>] ip_queue_xmit+0x1c0/0x3d0 > [<ffffffff8148d3ef>] tcp_transmit_skb+0x40f/0x520 > [<ffffffff8148e5ff>] tcp_retransmit_skb+0x16f/0x2e0 > [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 > [<ffffffff814905ad>] tcp_retransmit_timer+0x18d/0x4a0 > [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 > [<ffffffff81490994>] tcp_write_timer+0xd4/0x100 > [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 > [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 > [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 > [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 > [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 > [<ffffffff81511b7c>] call_softirq+0x1c/0x30 > [<ffffffff810172e5>] do_softirq+0x65/0xa0 > [<ffffffff8107656b>] irq_exit+0xab/0xc0 > [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 > [<ffffffff81511bce>] xen_do_hypervisor_callback+0x1e/0x30 > <EOI> > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 > [<ffffffff8100a0d0>] ? xen_safe_halt+0x10/0x20 > [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 > [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 > [<ffffffff8100a8e9>] ? xen_irq_enable_direct_reloc+0x4/0x4 > [<ffffffff814f7a2e>] ? cpu_bringup_and_idle+0xe/0x10 > Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c > RIP [<ffffffff812605bb>] memcpy+0xb/0x120 > > > Per vmcore, the socket info as below: > ------------------------------------------------------------------------------ > <struct tcp_sock 0xffff88004d344e00> TCP > tcp 10.1.1.11:42147 10.1.1.21:3260 FIN_WAIT1 > windows: rcv=122124, snd=65535 advmss=8948 rcv_ws=1 snd_ws=0 > nonagle=1 sack_ok=0 tstamp_ok=1 > rmem_alloc=0, wmem_alloc=10229 > rx_queue=0, tx_queue=149765 > rcvbuf=262142, sndbuf=262142 > rcv_tstamp=51.4 s, lsndtime=0.0 s ago > -- Retransmissions -- > retransmits=7, ca_state=TCP_CA_Disorder > ------------------------------------------------------------------------------ > > When sock status move to FIN_WAIT1, will it cleanup all skb or no? I get crashes as well using UDP application. Its not related to TCP. There is some corruption going on in neighbour code. [ 942.319645] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 942.327510] IP: [<ffffffff814e4558>] __neigh_event_send+0x1a8/0x240 [ 942.333799] PGD c5a125067 PUD c603e1067 PMD 0 [ 942.338292] Oops: 0002 [#1] SMP [ 942.341819] gsmi: Log Shutdown Reason 0x03 [ 942.364995] CPU: 8 PID: 13760 Comm: netperf Tainted: G W 3.10.0-smp-DEV #155 [ 942.380212] task: ffff88065b54b000 ti: ffff8806498fc000 task.ti: ffff8806498fc000 [ 942.387689] RIP: 0010:[<ffffffff814e4558>] [<ffffffff814e4558>] __neigh_event_send+0x1a8/0x240 [ 942.396402] RSP: 0018:ffff8806498fd9d8 EFLAGS: 00010206 [ 942.401709] RAX: 0000000000000000 RBX: ffff88065a8f9000 RCX: ffff88065fdf61c0 [ 942.408837] RDX: 0000000000000000 RSI: ffff880c5d5b3080 RDI: ffff880c5b9c0ac0 [ 942.415966] RBP: ffff8806498fd9f8 R08: ffff88064cb00000 R09: ffff8806498fda70 [ 942.423095] R10: ffff880c5ffbead0 R11: ffffffff815137d0 R12: ffff88065a8f9030 [ 942.430232] R13: ffff880c5d5b3080 R14: 0000000000000000 R15: ffff88065b4af940 [ 942.437362] FS: 00007fd613190700(0000) GS:ffff880c7fc40000(0000) knlGS:0000000000000000 [ 942.445452] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 942.451193] CR2: 0000000000000008 CR3: 0000000c59b60000 CR4: 00000000000007e0 [ 942.458324] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 942.465460] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Jun 27 05:49:12 [ 942.472597] Stack: [ 942.475997] ffff880c5d5b3080 ffff88065a8f9000 ffff880c59ac43c0 0000000000000088 [ 942.483473] ffff8806498fda48 ffffffff814e50db ffff880c5d5b3080 ffffffff81514c60 [ 942.490947] 0000000000000088 ffff88064cb00000 ffff880c5d5b3080 ffff880c59ac43c0 [ 942.498415] Call Trace: [ 942.500873] [<ffffffff814e50db>] neigh_resolve_output+0x14b/0x1f0 lpq84 kernel: [ [ 942.507056] [<ffffffff81514c60>] ? __ip_append_data.isra.39+0x9e0/0x9e0 [ 942.515138] [<ffffffff81514ddf>] ip_finish_output+0x17f/0x380 [ 942.520972] [<ffffffff81515bb3>] ip_output+0x53/0x90 942.341819] gsm[ 942.526030] [<ffffffff815167d6>] ? ip_make_skb+0xf6/0x120 [ 942.532897] [<ffffffff81515379>] ip_local_out+0x29/0x30 i: Log Shutdown [ 942.538215] [<ffffffff81516649>] ip_send_skb+0x19/0x50 Reason 0x03 [ 942.544825] [<ffffffff8153a65e>] udp_send_skb+0x2ce/0x3a0 [ 942.551439] [<ffffffff815137d0>] ? ip_setup_cork+0x110/0x110 ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-28 4:17 ` Joe Jin @ 2013-06-30 9:13 ` Alex Bligh -1 siblings, 0 replies; 64+ messages in thread From: Alex Bligh @ 2013-06-30 9:13 UTC (permalink / raw) To: Joe Jin, Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini, Alex Bligh --On 28 June 2013 12:17:43 +0800 Joe Jin <joe.jin@oracle.com> wrote: > Find a similar issue > http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen > developer as well. I thought this sounded familiar. I haven't got the start of this thread, but what version of Xen are you running and what device model? If before 4.3, there is a page lifetime bug in the kernel (not the xen code) which can affect anything where the guest accesses the host's block stack and that in turn accesses the networking stack (it may in fact be wider than that). So, e.g. domU on iCSSI will do it. It tends to get triggered by a TCP retransmit or (on NFS) the RPC equivalent. Essentially block operation is considered complete, returning through xen and freeing the grant table entry, and yet something in the kernel (e.g. tcp retransmit) can still access the data. The nature of the bug is extensively discussed in that thread - you'll also find a reference to a thread on linux-nfs which concludes it isn't an nfs problem, and even some patches to fix it in the kernel adding reference counting. A workaround is to turn off O_DIRECT use by Xen as that ensures the pages are copied. Xen 4.3 does this by default. I believe fixes for this are in 4.3 and 4.2.2 if using the qemu upstream DM. Note these aren't real fixes, just a workaround of a kernel bug. To fix on a local build of xen you will need something like this: https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 and something like this (NB: obviously insert your own git repo and commit numbers) https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca Also note those fixes are (technically) unsafe for live migration unless there is an ordering change made in qemu's block open call. Of course this might be something completely different. -- Alex Bligh ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits @ 2013-06-30 9:13 ` Alex Bligh 0 siblings, 0 replies; 64+ messages in thread From: Alex Bligh @ 2013-06-30 9:13 UTC (permalink / raw) To: Joe Jin, Eric Dumazet Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, Alex Bligh, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller --On 28 June 2013 12:17:43 +0800 Joe Jin <joe.jin@oracle.com> wrote: > Find a similar issue > http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen > developer as well. I thought this sounded familiar. I haven't got the start of this thread, but what version of Xen are you running and what device model? If before 4.3, there is a page lifetime bug in the kernel (not the xen code) which can affect anything where the guest accesses the host's block stack and that in turn accesses the networking stack (it may in fact be wider than that). So, e.g. domU on iCSSI will do it. It tends to get triggered by a TCP retransmit or (on NFS) the RPC equivalent. Essentially block operation is considered complete, returning through xen and freeing the grant table entry, and yet something in the kernel (e.g. tcp retransmit) can still access the data. The nature of the bug is extensively discussed in that thread - you'll also find a reference to a thread on linux-nfs which concludes it isn't an nfs problem, and even some patches to fix it in the kernel adding reference counting. A workaround is to turn off O_DIRECT use by Xen as that ensures the pages are copied. Xen 4.3 does this by default. I believe fixes for this are in 4.3 and 4.2.2 if using the qemu upstream DM. Note these aren't real fixes, just a workaround of a kernel bug. To fix on a local build of xen you will need something like this: https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 and something like this (NB: obviously insert your own git repo and commit numbers) https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca Also note those fixes are (technically) unsafe for live migration unless there is an ordering change made in qemu's block open call. Of course this might be something completely different. -- Alex Bligh ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-30 9:13 ` Alex Bligh (?) @ 2013-06-30 9:35 ` Alex Bligh -1 siblings, 0 replies; 64+ messages in thread From: Alex Bligh @ 2013-06-30 9:35 UTC (permalink / raw) To: Alex Bligh, Joe Jin, Eric Dumazet Cc: Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini, Alex Bligh --On 30 June 2013 10:13:35 +0100 Alex Bligh <alex@alex.org.uk> wrote: > The nature of the bug > is extensively discussed in that thread - you'll also find > a reference to a thread on linux-nfs which concludes it > isn't an nfs problem, and even some patches to fix it in the > kernel adding reference counting. Some more links for anyone interested in fixing the kernel bug: http://lists.xen.org/archives/html/xen-devel/2013-01/msg01618.html http://www.spinics.net/lists/linux-nfs/msg34913.html http://www.spinics.net/lists/netdev/msg224106.html -- Alex Bligh ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-30 9:13 ` Alex Bligh (?) (?) @ 2013-06-30 9:35 ` Alex Bligh -1 siblings, 0 replies; 64+ messages in thread From: Alex Bligh @ 2013-06-30 9:35 UTC (permalink / raw) To: Joe Jin, Eric Dumazet Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, Alex Bligh, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller --On 30 June 2013 10:13:35 +0100 Alex Bligh <alex@alex.org.uk> wrote: > The nature of the bug > is extensively discussed in that thread - you'll also find > a reference to a thread on linux-nfs which concludes it > isn't an nfs problem, and even some patches to fix it in the > kernel adding reference counting. Some more links for anyone interested in fixing the kernel bug: http://lists.xen.org/archives/html/xen-devel/2013-01/msg01618.html http://www.spinics.net/lists/linux-nfs/msg34913.html http://www.spinics.net/lists/netdev/msg224106.html -- Alex Bligh ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-30 9:13 ` Alex Bligh ` (2 preceding siblings ...) (?) @ 2013-07-01 3:18 ` Joe Jin 2013-07-01 8:11 ` Ian Campbell ` (3 more replies) -1 siblings, 4 replies; 64+ messages in thread From: Joe Jin @ 2013-07-01 3:18 UTC (permalink / raw) To: Alex Bligh Cc: Eric Dumazet, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini On 06/30/13 17:13, Alex Bligh wrote: > > > --On 28 June 2013 12:17:43 +0800 Joe Jin <joe.jin@oracle.com> wrote: > >> Find a similar issue >> http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen >> developer as well. > > I thought this sounded familiar. I haven't got the start of this > thread, but what version of Xen are you running and what device > model? If before 4.3, there is a page lifetime bug in the kernel > (not the xen code) which can affect anything where the guest accesses > the host's block stack and that in turn accesses the networking > stack (it may in fact be wider than that). So, e.g. domU on > iCSSI will do it. It tends to get triggered by a TCP retransmit > or (on NFS) the RPC equivalent. Essentially block operation > is considered complete, returning through xen and freeing the > grant table entry, and yet something in the kernel (e.g. tcp > retransmit) can still access the data. The nature of the bug > is extensively discussed in that thread - you'll also find > a reference to a thread on linux-nfs which concludes it > isn't an nfs problem, and even some patches to fix it in the > kernel adding reference counting. Do you know if have a fix for above? so far we also suspected the grant page be unmapped earlier, we using 4.1 stable during our test. > > A workaround is to turn off O_DIRECT use by Xen as that ensures > the pages are copied. Xen 4.3 does this by default. > > I believe fixes for this are in 4.3 and 4.2.2 if using the > qemu upstream DM. Note these aren't real fixes, just a workaround > of a kernel bug. The guest is pvm, and disk model is xvbd, guest config file as below: vif = ['mac=00:21:f6:00:00:01,bridge=c0a80b00'] OVM_simple_name = 'Guest#1' disk = ['file:/OVS/Repositories/0004fb000003000091e9eae94d1e907c/VirtualDisks/0004fb0000120000f78799dad800ef47.img,xvda,w', 'phy:/dev/mapper/360060e8010141870058b415700000002,xvdb,w', 'phy:/dev/mapper/360060e8010141870058b415700000003,xvdc,w'] bootargs = '' uuid = '0004fb00-0006-0000-2b00-77a4766001ed' on_reboot = 'restart' cpu_weight = 27500 OVM_os_type = 'Oracle Linux 5' cpu_cap = 0 maxvcpus = 8 OVM_high_availability = False memory = 4096 OVM_description = '' on_poweroff = 'destroy' on_crash = 'restart' bootloader = '/usr/bin/pygrub' guest_os_type = 'linux' name = '0004fb00000600002b0077a4766001ed' vfb = ['type=vnc,vncunused=1,vnclisten=127.0.0.1,keymap=en-us'] vcpus = 8 OVM_cpu_compat_group = '' OVM_domain_type = 'xen_pvm' > > To fix on a local build of xen you will need something like this: > https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 > and something like this (NB: obviously insert your own git > repo and commit numbers) > https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca > I think this only for pvhvm/hvm? Thanks, Joe > Also note those fixes are (technically) unsafe for live migration > unless there is an ordering change made in qemu's block open > call. > > Of course this might be something completely different. > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-01 3:18 ` Joe Jin @ 2013-07-01 8:11 ` Ian Campbell 2013-07-01 8:11 ` Ian Campbell ` (2 subsequent siblings) 3 siblings, 0 replies; 64+ messages in thread From: Ian Campbell @ 2013-07-01 8:11 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, zheng.x.li, Jan Beulich, Eric Dumazet, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Alex Bligh, David S. Miller On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: > > A workaround is to turn off O_DIRECT use by Xen as that ensures > > the pages are copied. Xen 4.3 does this by default. > > > > I believe fixes for this are in 4.3 and 4.2.2 if using the > > qemu upstream DM. Note these aren't real fixes, just a workaround > > of a kernel bug. > > The guest is pvm, and disk model is xvbd, guest config file as below: Do you know which disk backend? The workaround Alex refers to went into qdisk but I think blkback could still suffer from a variant of the retransmit issue if you run it over iSCSI. > > To fix on a local build of xen you will need something like this: > > https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 > > and something like this (NB: obviously insert your own git > > repo and commit numbers) > > https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca > > > > I think this only for pvhvm/hvm? No, the underlying issue affects any PV device which is run over a network protocol (NFS, iSCSI etc). In effect a delayed retransmit can cross over the deayed ack and cause I/O to be completed while retransmits are pending, such as is described in http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS variant). The problem is that because Xen PV drivers often unmap the page on I/O completion you get a crash (page fault) on the retransmit. The issue also affects native but in that case the symptom is "just" a corrupt packet on the wire. I tried to address this with my "skb destructor" series but unfortunately I got bogged down on the details, then I had to take time out to look into some other stuff and never managed to get back into it. I'd be very grateful if there was someone who could pick up that work (Alex gave some useful references in another reply to this thread) Some PV disk backends (e.g. blktap2) have worked around this by using grant copy instead of grant map, others (e.g. qdisk) have disabled O_DIRECT so that the pages are copied into the dom0 page cache and transmitted from there. We were discussing recently the possibility of mapping all ballooned out pages to a single read-only scratch page instead of leaving them empty in the page tables, this would cause the Xen case to revert to the native case. I think Thanos was going to take a look into this. Ian. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-01 3:18 ` Joe Jin 2013-07-01 8:11 ` Ian Campbell @ 2013-07-01 8:11 ` Ian Campbell 2013-07-01 13:00 ` Joe Jin ` (3 more replies) 2013-07-01 8:29 ` Alex Bligh 2013-07-01 8:29 ` Alex Bligh 3 siblings, 4 replies; 64+ messages in thread From: Ian Campbell @ 2013-07-01 8:11 UTC (permalink / raw) To: Joe Jin Cc: Alex Bligh, Eric Dumazet, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Jan Beulich, Stefano Stabellini, Konrad Rzeszutek Wilk On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: > > A workaround is to turn off O_DIRECT use by Xen as that ensures > > the pages are copied. Xen 4.3 does this by default. > > > > I believe fixes for this are in 4.3 and 4.2.2 if using the > > qemu upstream DM. Note these aren't real fixes, just a workaround > > of a kernel bug. > > The guest is pvm, and disk model is xvbd, guest config file as below: Do you know which disk backend? The workaround Alex refers to went into qdisk but I think blkback could still suffer from a variant of the retransmit issue if you run it over iSCSI. > > To fix on a local build of xen you will need something like this: > > https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 > > and something like this (NB: obviously insert your own git > > repo and commit numbers) > > https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca > > > > I think this only for pvhvm/hvm? No, the underlying issue affects any PV device which is run over a network protocol (NFS, iSCSI etc). In effect a delayed retransmit can cross over the deayed ack and cause I/O to be completed while retransmits are pending, such as is described in http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS variant). The problem is that because Xen PV drivers often unmap the page on I/O completion you get a crash (page fault) on the retransmit. The issue also affects native but in that case the symptom is "just" a corrupt packet on the wire. I tried to address this with my "skb destructor" series but unfortunately I got bogged down on the details, then I had to take time out to look into some other stuff and never managed to get back into it. I'd be very grateful if there was someone who could pick up that work (Alex gave some useful references in another reply to this thread) Some PV disk backends (e.g. blktap2) have worked around this by using grant copy instead of grant map, others (e.g. qdisk) have disabled O_DIRECT so that the pages are copied into the dom0 page cache and transmitted from there. We were discussing recently the possibility of mapping all ballooned out pages to a single read-only scratch page instead of leaving them empty in the page tables, this would cause the Xen case to revert to the native case. I think Thanos was going to take a look into this. Ian. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-01 8:11 ` Ian Campbell @ 2013-07-01 13:00 ` Joe Jin 2013-07-01 13:00 ` Joe Jin ` (2 subsequent siblings) 3 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-07-01 13:00 UTC (permalink / raw) To: Ian Campbell Cc: Frank Blaschka, zheng.x.li, Jan Beulich, Eric Dumazet, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Alex Bligh, David S. Miller On 07/01/13 16:11, Ian Campbell wrote: > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: >>> A workaround is to turn off O_DIRECT use by Xen as that ensures >>> the pages are copied. Xen 4.3 does this by default. >>> >>> I believe fixes for this are in 4.3 and 4.2.2 if using the >>> qemu upstream DM. Note these aren't real fixes, just a workaround >>> of a kernel bug. >> >> The guest is pvm, and disk model is xvbd, guest config file as below: > > Do you know which disk backend? The workaround Alex refers to went into > qdisk but I think blkback could still suffer from a variant of the > retransmit issue if you run it over iSCSI. The backend is xen-blkback on iSCSI storage. > >>> To fix on a local build of xen you will need something like this: >>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 >>> and something like this (NB: obviously insert your own git >>> repo and commit numbers) >>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca >>> >> >> I think this only for pvhvm/hvm? > > No, the underlying issue affects any PV device which is run over a > network protocol (NFS, iSCSI etc). In effect a delayed retransmit can > cross over the deayed ack and cause I/O to be completed while > retransmits are pending, such as is described in > http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS > variant). The problem is that because Xen PV drivers often unmap the > page on I/O completion you get a crash (page fault) on the retransmit. > To prevent iSCSI call sendpage() reuse the page we disabled the sg from NIC, per test result the panic went. This also confirmed the page be unmpped by grant system, the symptom as same as nfs panic. > The issue also affects native but in that case the symptom is "just" a > corrupt packet on the wire. I tried to address this with my "skb > destructor" series but unfortunately I got bogged down on the details, > then I had to take time out to look into some other stuff and never > managed to get back into it. I'd be very grateful if there was someone > who could pick up that work (Alex gave some useful references in another > reply to this thread) > > Some PV disk backends (e.g. blktap2) have worked around this by using > grant copy instead of grant map, others (e.g. qdisk) have disabled > O_DIRECT so that the pages are copied into the dom0 page cache and > transmitted from there. The work around as same as we disable sg from NIC(disable it sendpage will create own page copy rather than reuse the page). Thanks, Joe > > We were discussing recently the possibility of mapping all ballooned out > pages to a single read-only scratch page instead of leaving them empty > in the page tables, this would cause the Xen case to revert to the > native case. I think Thanos was going to take a look into this. > > Ian. > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-01 8:11 ` Ian Campbell 2013-07-01 13:00 ` Joe Jin @ 2013-07-01 13:00 ` Joe Jin 2013-07-04 8:55 ` Joe Jin 2013-07-04 8:55 ` Joe Jin 3 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-07-01 13:00 UTC (permalink / raw) To: Ian Campbell Cc: Alex Bligh, Eric Dumazet, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Jan Beulich, Stefano Stabellini, Konrad Rzeszutek Wilk On 07/01/13 16:11, Ian Campbell wrote: > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: >>> A workaround is to turn off O_DIRECT use by Xen as that ensures >>> the pages are copied. Xen 4.3 does this by default. >>> >>> I believe fixes for this are in 4.3 and 4.2.2 if using the >>> qemu upstream DM. Note these aren't real fixes, just a workaround >>> of a kernel bug. >> >> The guest is pvm, and disk model is xvbd, guest config file as below: > > Do you know which disk backend? The workaround Alex refers to went into > qdisk but I think blkback could still suffer from a variant of the > retransmit issue if you run it over iSCSI. The backend is xen-blkback on iSCSI storage. > >>> To fix on a local build of xen you will need something like this: >>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 >>> and something like this (NB: obviously insert your own git >>> repo and commit numbers) >>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca >>> >> >> I think this only for pvhvm/hvm? > > No, the underlying issue affects any PV device which is run over a > network protocol (NFS, iSCSI etc). In effect a delayed retransmit can > cross over the deayed ack and cause I/O to be completed while > retransmits are pending, such as is described in > http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS > variant). The problem is that because Xen PV drivers often unmap the > page on I/O completion you get a crash (page fault) on the retransmit. > To prevent iSCSI call sendpage() reuse the page we disabled the sg from NIC, per test result the panic went. This also confirmed the page be unmpped by grant system, the symptom as same as nfs panic. > The issue also affects native but in that case the symptom is "just" a > corrupt packet on the wire. I tried to address this with my "skb > destructor" series but unfortunately I got bogged down on the details, > then I had to take time out to look into some other stuff and never > managed to get back into it. I'd be very grateful if there was someone > who could pick up that work (Alex gave some useful references in another > reply to this thread) > > Some PV disk backends (e.g. blktap2) have worked around this by using > grant copy instead of grant map, others (e.g. qdisk) have disabled > O_DIRECT so that the pages are copied into the dom0 page cache and > transmitted from there. The work around as same as we disable sg from NIC(disable it sendpage will create own page copy rather than reuse the page). Thanks, Joe > > We were discussing recently the possibility of mapping all ballooned out > pages to a single read-only scratch page instead of leaving them empty > in the page tables, this would cause the Xen case to revert to the > native case. I think Thanos was going to take a look into this. > > Ian. > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-01 8:11 ` Ian Campbell 2013-07-01 13:00 ` Joe Jin 2013-07-01 13:00 ` Joe Jin @ 2013-07-04 8:55 ` Joe Jin 2013-07-04 8:55 ` Joe Jin 3 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-07-04 8:55 UTC (permalink / raw) To: Ian Campbell Cc: Frank Blaschka, zheng.x.li, Jan Beulich, Eric Dumazet, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Alex Bligh, David S. Miller On 07/01/13 16:11, Ian Campbell wrote: > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: >>> A workaround is to turn off O_DIRECT use by Xen as that ensures >>> the pages are copied. Xen 4.3 does this by default. >>> >>> I believe fixes for this are in 4.3 and 4.2.2 if using the >>> qemu upstream DM. Note these aren't real fixes, just a workaround >>> of a kernel bug. >> >> The guest is pvm, and disk model is xvbd, guest config file as below: > > Do you know which disk backend? The workaround Alex refers to went into > qdisk but I think blkback could still suffer from a variant of the > retransmit issue if you run it over iSCSI. > >>> To fix on a local build of xen you will need something like this: >>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 >>> and something like this (NB: obviously insert your own git >>> repo and commit numbers) >>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca >>> >> >> I think this only for pvhvm/hvm? > > No, the underlying issue affects any PV device which is run over a > network protocol (NFS, iSCSI etc). In effect a delayed retransmit can > cross over the deayed ack and cause I/O to be completed while > retransmits are pending, such as is described in > http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS > variant). The problem is that because Xen PV drivers often unmap the > page on I/O completion you get a crash (page fault) on the retransmit. > Can we do it by remember grant page refcount when mapping, and when unmap check if page refcount as same as mapping? This change will limited in xen-blkback. Another way is add new page flag like PG_send, when sendpage() be called, set the bit, when page be put, clear the bit. Then xen-blkback can wait on the pagequeue. Thanks, Joe > The issue also affects native but in that case the symptom is "just" a > corrupt packet on the wire. I tried to address this with my "skb > destructor" series but unfortunately I got bogged down on the details, > then I had to take time out to look into some other stuff and never > managed to get back into it. I'd be very grateful if there was someone > who could pick up that work (Alex gave some useful references in another > reply to this thread) > > Some PV disk backends (e.g. blktap2) have worked around this by using > grant copy instead of grant map, others (e.g. qdisk) have disabled > O_DIRECT so that the pages are copied into the dom0 page cache and > transmitted from there. > > We were discussing recently the possibility of mapping all ballooned out > pages to a single read-only scratch page instead of leaving them empty > in the page tables, this would cause the Xen case to revert to the > native case. I think Thanos was going to take a look into this. > > Ian. > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-01 8:11 ` Ian Campbell ` (2 preceding siblings ...) 2013-07-04 8:55 ` Joe Jin @ 2013-07-04 8:55 ` Joe Jin 2013-07-04 8:59 ` Ian Campbell 2013-07-04 8:59 ` Ian Campbell 3 siblings, 2 replies; 64+ messages in thread From: Joe Jin @ 2013-07-04 8:55 UTC (permalink / raw) To: Ian Campbell Cc: Alex Bligh, Eric Dumazet, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Jan Beulich, Stefano Stabellini, Konrad Rzeszutek Wilk On 07/01/13 16:11, Ian Campbell wrote: > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: >>> A workaround is to turn off O_DIRECT use by Xen as that ensures >>> the pages are copied. Xen 4.3 does this by default. >>> >>> I believe fixes for this are in 4.3 and 4.2.2 if using the >>> qemu upstream DM. Note these aren't real fixes, just a workaround >>> of a kernel bug. >> >> The guest is pvm, and disk model is xvbd, guest config file as below: > > Do you know which disk backend? The workaround Alex refers to went into > qdisk but I think blkback could still suffer from a variant of the > retransmit issue if you run it over iSCSI. > >>> To fix on a local build of xen you will need something like this: >>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 >>> and something like this (NB: obviously insert your own git >>> repo and commit numbers) >>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca >>> >> >> I think this only for pvhvm/hvm? > > No, the underlying issue affects any PV device which is run over a > network protocol (NFS, iSCSI etc). In effect a delayed retransmit can > cross over the deayed ack and cause I/O to be completed while > retransmits are pending, such as is described in > http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS > variant). The problem is that because Xen PV drivers often unmap the > page on I/O completion you get a crash (page fault) on the retransmit. > Can we do it by remember grant page refcount when mapping, and when unmap check if page refcount as same as mapping? This change will limited in xen-blkback. Another way is add new page flag like PG_send, when sendpage() be called, set the bit, when page be put, clear the bit. Then xen-blkback can wait on the pagequeue. Thanks, Joe > The issue also affects native but in that case the symptom is "just" a > corrupt packet on the wire. I tried to address this with my "skb > destructor" series but unfortunately I got bogged down on the details, > then I had to take time out to look into some other stuff and never > managed to get back into it. I'd be very grateful if there was someone > who could pick up that work (Alex gave some useful references in another > reply to this thread) > > Some PV disk backends (e.g. blktap2) have worked around this by using > grant copy instead of grant map, others (e.g. qdisk) have disabled > O_DIRECT so that the pages are copied into the dom0 page cache and > transmitted from there. > > We were discussing recently the possibility of mapping all ballooned out > pages to a single read-only scratch page instead of leaving them empty > in the page tables, this would cause the Xen case to revert to the > native case. I think Thanos was going to take a look into this. > > Ian. > -- Oracle <http://www.oracle.com> Joe Jin | Software Development Senior Manager | +8610.6106.5624 ORACLE | Linux and Virtualization No. 24 Zhongguancun Software Park, Haidian District | 100193 Beijing ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 8:55 ` Joe Jin @ 2013-07-04 8:59 ` Ian Campbell 2013-07-04 8:59 ` Ian Campbell 1 sibling, 0 replies; 64+ messages in thread From: Ian Campbell @ 2013-07-04 8:59 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, zheng.x.li, Jan Beulich, Eric Dumazet, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Alex Bligh, David S. Miller On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote: > On 07/01/13 16:11, Ian Campbell wrote: > > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: > >>> A workaround is to turn off O_DIRECT use by Xen as that ensures > >>> the pages are copied. Xen 4.3 does this by default. > >>> > >>> I believe fixes for this are in 4.3 and 4.2.2 if using the > >>> qemu upstream DM. Note these aren't real fixes, just a workaround > >>> of a kernel bug. > >> > >> The guest is pvm, and disk model is xvbd, guest config file as below: > > > > Do you know which disk backend? The workaround Alex refers to went into > > qdisk but I think blkback could still suffer from a variant of the > > retransmit issue if you run it over iSCSI. > > > >>> To fix on a local build of xen you will need something like this: > >>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 > >>> and something like this (NB: obviously insert your own git > >>> repo and commit numbers) > >>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca > >>> > >> > >> I think this only for pvhvm/hvm? > > > > No, the underlying issue affects any PV device which is run over a > > network protocol (NFS, iSCSI etc). In effect a delayed retransmit can > > cross over the deayed ack and cause I/O to be completed while > > retransmits are pending, such as is described in > > http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS > > variant). The problem is that because Xen PV drivers often unmap the > > page on I/O completion you get a crash (page fault) on the retransmit. > > > > Can we do it by remember grant page refcount when mapping, and when unmap > check if page refcount as same as mapping? This change will limited in > xen-blkback. > > Another way is add new page flag like PG_send, when sendpage() be called, > set the bit, when page be put, clear the bit. Then xen-blkback can wait > on the pagequeue. These schemes don't work when you have multiple simultaneous I/Os referencing the same underlying page. > > Thanks, > Joe > > > The issue also affects native but in that case the symptom is "just" a > > corrupt packet on the wire. I tried to address this with my "skb > > destructor" series but unfortunately I got bogged down on the details, > > then I had to take time out to look into some other stuff and never > > managed to get back into it. I'd be very grateful if there was someone > > who could pick up that work (Alex gave some useful references in another > > reply to this thread) > > > > Some PV disk backends (e.g. blktap2) have worked around this by using > > grant copy instead of grant map, others (e.g. qdisk) have disabled > > O_DIRECT so that the pages are copied into the dom0 page cache and > > transmitted from there. > > > > We were discussing recently the possibility of mapping all ballooned out > > pages to a single read-only scratch page instead of leaving them empty > > in the page tables, this would cause the Xen case to revert to the > > native case. I think Thanos was going to take a look into this. > > > > Ian. > > > > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 8:55 ` Joe Jin 2013-07-04 8:59 ` Ian Campbell @ 2013-07-04 8:59 ` Ian Campbell 2013-07-04 9:34 ` Eric Dumazet 2013-07-04 9:34 ` Eric Dumazet 1 sibling, 2 replies; 64+ messages in thread From: Ian Campbell @ 2013-07-04 8:59 UTC (permalink / raw) To: Joe Jin Cc: Alex Bligh, Eric Dumazet, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Jan Beulich, Stefano Stabellini, Konrad Rzeszutek Wilk On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote: > On 07/01/13 16:11, Ian Campbell wrote: > > On Mon, 2013-07-01 at 11:18 +0800, Joe Jin wrote: > >>> A workaround is to turn off O_DIRECT use by Xen as that ensures > >>> the pages are copied. Xen 4.3 does this by default. > >>> > >>> I believe fixes for this are in 4.3 and 4.2.2 if using the > >>> qemu upstream DM. Note these aren't real fixes, just a workaround > >>> of a kernel bug. > >> > >> The guest is pvm, and disk model is xvbd, guest config file as below: > > > > Do you know which disk backend? The workaround Alex refers to went into > > qdisk but I think blkback could still suffer from a variant of the > > retransmit issue if you run it over iSCSI. > > > >>> To fix on a local build of xen you will need something like this: > >>> https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 > >>> and something like this (NB: obviously insert your own git > >>> repo and commit numbers) > >>> https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca > >>> > >> > >> I think this only for pvhvm/hvm? > > > > No, the underlying issue affects any PV device which is run over a > > network protocol (NFS, iSCSI etc). In effect a delayed retransmit can > > cross over the deayed ack and cause I/O to be completed while > > retransmits are pending, such as is described in > > http://www.spinics.net/lists/linux-nfs/msg34913.html (the original NFS > > variant). The problem is that because Xen PV drivers often unmap the > > page on I/O completion you get a crash (page fault) on the retransmit. > > > > Can we do it by remember grant page refcount when mapping, and when unmap > check if page refcount as same as mapping? This change will limited in > xen-blkback. > > Another way is add new page flag like PG_send, when sendpage() be called, > set the bit, when page be put, clear the bit. Then xen-blkback can wait > on the pagequeue. These schemes don't work when you have multiple simultaneous I/Os referencing the same underlying page. > > Thanks, > Joe > > > The issue also affects native but in that case the symptom is "just" a > > corrupt packet on the wire. I tried to address this with my "skb > > destructor" series but unfortunately I got bogged down on the details, > > then I had to take time out to look into some other stuff and never > > managed to get back into it. I'd be very grateful if there was someone > > who could pick up that work (Alex gave some useful references in another > > reply to this thread) > > > > Some PV disk backends (e.g. blktap2) have worked around this by using > > grant copy instead of grant map, others (e.g. qdisk) have disabled > > O_DIRECT so that the pages are copied into the dom0 page cache and > > transmitted from there. > > > > We were discussing recently the possibility of mapping all ballooned out > > pages to a single read-only scratch page instead of leaving them empty > > in the page tables, this would cause the Xen case to revert to the > > native case. I think Thanos was going to take a look into this. > > > > Ian. > > > > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 8:59 ` Ian Campbell @ 2013-07-04 9:34 ` Eric Dumazet 2013-07-04 9:34 ` Eric Dumazet 1 sibling, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-07-04 9:34 UTC (permalink / raw) To: Ian Campbell Cc: Frank Blaschka, zheng.x.li, Jan Beulich, Stefano Stabellini, netdev, Joe Jin, linux-kernel, Xen Devel, Alex Bligh, David S. Miller On Thu, 2013-07-04 at 09:59 +0100, Ian Campbell wrote: > On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote: > > > > Another way is add new page flag like PG_send, when sendpage() be called, > > set the bit, when page be put, clear the bit. Then xen-blkback can wait > > on the pagequeue. > > These schemes don't work when you have multiple simultaneous I/Os > referencing the same underlying page. So this is a page property, still the patches I saw tried to address this problem adding networking stuff (destructors) in the skbs. Given that a page refcount can be transfered between entities, say using splice() system call, I do not really understand why the fix would imply networking only. Let's try to fix it properly, or else we must disable zero copies because they are not reliable. Why sendfile() doesn't have the problem, but vmsplice()+splice() do have this issue ? As soon as a page fragment reference is taken somewhere, the only way to properly reuse the page is to rely on put_page() and page being freed. Adding workarounds in TCP stack to always copy the page fragments in case of a retransmit is partial solution, as the remote peer could be malicious and send ACK _before_ page content is actually read by the NIC. So if we rely on networking stacks to give the signal for page reuse, we can have major security issue. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 8:59 ` Ian Campbell 2013-07-04 9:34 ` Eric Dumazet @ 2013-07-04 9:34 ` Eric Dumazet 2013-07-04 9:52 ` Ian Campbell 2013-07-04 9:52 ` Ian Campbell 1 sibling, 2 replies; 64+ messages in thread From: Eric Dumazet @ 2013-07-04 9:34 UTC (permalink / raw) To: Ian Campbell Cc: Joe Jin, Alex Bligh, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Jan Beulich, Stefano Stabellini, Konrad Rzeszutek Wilk On Thu, 2013-07-04 at 09:59 +0100, Ian Campbell wrote: > On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote: > > > > Another way is add new page flag like PG_send, when sendpage() be called, > > set the bit, when page be put, clear the bit. Then xen-blkback can wait > > on the pagequeue. > > These schemes don't work when you have multiple simultaneous I/Os > referencing the same underlying page. So this is a page property, still the patches I saw tried to address this problem adding networking stuff (destructors) in the skbs. Given that a page refcount can be transfered between entities, say using splice() system call, I do not really understand why the fix would imply networking only. Let's try to fix it properly, or else we must disable zero copies because they are not reliable. Why sendfile() doesn't have the problem, but vmsplice()+splice() do have this issue ? As soon as a page fragment reference is taken somewhere, the only way to properly reuse the page is to rely on put_page() and page being freed. Adding workarounds in TCP stack to always copy the page fragments in case of a retransmit is partial solution, as the remote peer could be malicious and send ACK _before_ page content is actually read by the NIC. So if we rely on networking stacks to give the signal for page reuse, we can have major security issue. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 9:34 ` Eric Dumazet @ 2013-07-04 9:52 ` Ian Campbell 2013-07-04 9:52 ` Ian Campbell 1 sibling, 0 replies; 64+ messages in thread From: Ian Campbell @ 2013-07-04 9:52 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, zheng.x.li, Jan Beulich, Stefano Stabellini, netdev, Joe Jin, linux-kernel, Xen Devel, Alex Bligh, David S. Miller On Thu, 2013-07-04 at 02:34 -0700, Eric Dumazet wrote: > On Thu, 2013-07-04 at 09:59 +0100, Ian Campbell wrote: > > On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote: > > > > > > Another way is add new page flag like PG_send, when sendpage() be called, > > > set the bit, when page be put, clear the bit. Then xen-blkback can wait > > > on the pagequeue. > > > > These schemes don't work when you have multiple simultaneous I/Os > > referencing the same underlying page. > > So this is a page property, still the patches I saw tried to address > this problem adding networking stuff (destructors) in the skbs. > > Given that a page refcount can be transfered between entities, say using > splice() system call, I do not really understand why the fix would imply > networking only. > > Let's try to fix it properly, or else we must disable zero copies > because they are not reliable. > > Why sendfile() doesn't have the problem, but vmsplice()+splice() do have > this issue ? Might just be that no one has observed it with vmsplice()+splice()? Most of the time this happens silently and you'll probably never notice, it's just the behaviour of Xen which escalates the issue into one you can see. > As soon as a page fragment reference is taken somewhere, the only way to > properly reuse the page is to rely on put_page() and page being freed. Xen's out of tree netback used to fix this by a destructor call back on page free, but that was a core mm patch in the hot memory free path which wasn't popular, and it doesn't solve anything for the non-Xen instances of this issue. > Adding workarounds in TCP stack to always copy the page fragments in > case of a retransmit is partial solution, as the remote peer could be > malicious and send ACK _before_ page content is actually read by the > NIC. > > So if we rely on networking stacks to give the signal for page reuse, we > can have major security issue. If you ignore the Xen case and consider just the native case then the issue isn't page reuse in the sense of getting mapped into another process, it's the same page in the same process but the process has written something new to the buffer, e.g. memset(buf, 0xaa, 4096); write(fd, buf, 4096) memset(buf, 0x55, 4096); (where fd is O_DIRECT on NFS) Can result in 0x55 being seen on the wire in the TCP retransmit. If the retransmit is at the RPC layer then you get a resend of the NFS write RPC, but the XDR sequence stuff catches that case (I think, memory is fuzzy). If the retransmit is at the TCP level then the TCP sequence/ack will cause the receiver to ignore the corrupt version, but if you replace the second memset with write_critical_secret_key(buf), then you have an information leak. Ian. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 9:34 ` Eric Dumazet 2013-07-04 9:52 ` Ian Campbell @ 2013-07-04 9:52 ` Ian Campbell 2013-07-04 10:12 ` Eric Dumazet 2013-07-04 10:12 ` Eric Dumazet 1 sibling, 2 replies; 64+ messages in thread From: Ian Campbell @ 2013-07-04 9:52 UTC (permalink / raw) To: Eric Dumazet Cc: Joe Jin, Alex Bligh, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Jan Beulich, Stefano Stabellini, Konrad Rzeszutek Wilk On Thu, 2013-07-04 at 02:34 -0700, Eric Dumazet wrote: > On Thu, 2013-07-04 at 09:59 +0100, Ian Campbell wrote: > > On Thu, 2013-07-04 at 16:55 +0800, Joe Jin wrote: > > > > > > Another way is add new page flag like PG_send, when sendpage() be called, > > > set the bit, when page be put, clear the bit. Then xen-blkback can wait > > > on the pagequeue. > > > > These schemes don't work when you have multiple simultaneous I/Os > > referencing the same underlying page. > > So this is a page property, still the patches I saw tried to address > this problem adding networking stuff (destructors) in the skbs. > > Given that a page refcount can be transfered between entities, say using > splice() system call, I do not really understand why the fix would imply > networking only. > > Let's try to fix it properly, or else we must disable zero copies > because they are not reliable. > > Why sendfile() doesn't have the problem, but vmsplice()+splice() do have > this issue ? Might just be that no one has observed it with vmsplice()+splice()? Most of the time this happens silently and you'll probably never notice, it's just the behaviour of Xen which escalates the issue into one you can see. > As soon as a page fragment reference is taken somewhere, the only way to > properly reuse the page is to rely on put_page() and page being freed. Xen's out of tree netback used to fix this by a destructor call back on page free, but that was a core mm patch in the hot memory free path which wasn't popular, and it doesn't solve anything for the non-Xen instances of this issue. > Adding workarounds in TCP stack to always copy the page fragments in > case of a retransmit is partial solution, as the remote peer could be > malicious and send ACK _before_ page content is actually read by the > NIC. > > So if we rely on networking stacks to give the signal for page reuse, we > can have major security issue. If you ignore the Xen case and consider just the native case then the issue isn't page reuse in the sense of getting mapped into another process, it's the same page in the same process but the process has written something new to the buffer, e.g. memset(buf, 0xaa, 4096); write(fd, buf, 4096) memset(buf, 0x55, 4096); (where fd is O_DIRECT on NFS) Can result in 0x55 being seen on the wire in the TCP retransmit. If the retransmit is at the RPC layer then you get a resend of the NFS write RPC, but the XDR sequence stuff catches that case (I think, memory is fuzzy). If the retransmit is at the TCP level then the TCP sequence/ack will cause the receiver to ignore the corrupt version, but if you replace the second memset with write_critical_secret_key(buf), then you have an information leak. Ian. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 9:52 ` Ian Campbell @ 2013-07-04 10:12 ` Eric Dumazet 2013-07-04 10:12 ` Eric Dumazet 1 sibling, 0 replies; 64+ messages in thread From: Eric Dumazet @ 2013-07-04 10:12 UTC (permalink / raw) To: Ian Campbell Cc: Frank Blaschka, zheng.x.li, Jan Beulich, Stefano Stabellini, netdev, Joe Jin, linux-kernel, Xen Devel, Alex Bligh, David S. Miller On Thu, 2013-07-04 at 10:52 +0100, Ian Campbell wrote: > Might just be that no one has observed it with vmsplice()+splice()? Most > of the time this happens silently and you'll probably never notice, it's > just the behaviour of Xen which escalates the issue into one you can > see. The point I wanted to make is that nobody can seriously use vmsplice(), unless the memory is never reused by the application, or the application doesn't care of security implications. Because an application has no way to know when it's safe to reuse the area for another usage. [ Unless it uses the obscure and complex pagemap stuff (Documentation/vm/pagemap.txt), but its not asynchronous signaling and not pluggable into epoll()/poll()/select()) ] > Xen's out of tree netback used to fix this by a destructor call back on > page free, but that was a core mm patch in the hot memory free path > which wasn't popular, and it doesn't solve anything for the non-Xen > instances of this issue. It _is_ a mm core patch which is needed, if we ever want to fix this problem. It looks like a typical COW issue to me. If the page content is written while there is still a reference on this page, we should allocate a new page and copy the previous content. And this has little to do with networking. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 9:52 ` Ian Campbell 2013-07-04 10:12 ` Eric Dumazet @ 2013-07-04 10:12 ` Eric Dumazet 2013-07-04 12:57 ` Alex Bligh ` (3 more replies) 1 sibling, 4 replies; 64+ messages in thread From: Eric Dumazet @ 2013-07-04 10:12 UTC (permalink / raw) To: Ian Campbell Cc: Joe Jin, Alex Bligh, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Jan Beulich, Stefano Stabellini, Konrad Rzeszutek Wilk On Thu, 2013-07-04 at 10:52 +0100, Ian Campbell wrote: > Might just be that no one has observed it with vmsplice()+splice()? Most > of the time this happens silently and you'll probably never notice, it's > just the behaviour of Xen which escalates the issue into one you can > see. The point I wanted to make is that nobody can seriously use vmsplice(), unless the memory is never reused by the application, or the application doesn't care of security implications. Because an application has no way to know when it's safe to reuse the area for another usage. [ Unless it uses the obscure and complex pagemap stuff (Documentation/vm/pagemap.txt), but its not asynchronous signaling and not pluggable into epoll()/poll()/select()) ] > Xen's out of tree netback used to fix this by a destructor call back on > page free, but that was a core mm patch in the hot memory free path > which wasn't popular, and it doesn't solve anything for the non-Xen > instances of this issue. It _is_ a mm core patch which is needed, if we ever want to fix this problem. It looks like a typical COW issue to me. If the page content is written while there is still a reference on this page, we should allocate a new page and copy the previous content. And this has little to do with networking. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 10:12 ` Eric Dumazet @ 2013-07-04 12:57 ` Alex Bligh 2013-07-04 12:57 ` Alex Bligh ` (2 subsequent siblings) 3 siblings, 0 replies; 64+ messages in thread From: Alex Bligh @ 2013-07-04 12:57 UTC (permalink / raw) To: Eric Dumazet, Ian Campbell Cc: Frank Blaschka, zheng.x.li, Alex Bligh, Stefano Stabellini, netdev, Joe Jin, linux-kernel, Xen Devel, Jan Beulich, David S. Miller --On 4 July 2013 03:12:10 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > It looks like a typical COW issue to me. > > If the page content is written while there is still a reference on this > page, we should allocate a new page and copy the previous content. > > And this has little to do with networking. I suspect this would get more attention if we could make Ian's case below trigger (a) outside Xen, (b) outside networking. > memset(buf, 0xaa, 4096); > write(fd, buf, 4096) > memset(buf, 0x55, 4096); > (where fd is O_DIRECT on NFS) Can result in 0x55 being seen on the wire > in the TCP retransmit. We know this should fail using O_DIRECT+NFS. We've had reports suggesting it fails in O_DIRECT+iSCSI. However, that's been with a kernel panic (under Xen) rather than data corruption as per the above. Historical trawling suggests this is an issue with DRDB (see Ian's original thread from the mists of time). I don't quite understand why we aren't seeing corruption with standard ATA devices + O_DIRECT and no Xen involved at all. My memory is a bit misty on this but I had thought the reason why this would NOT be solved simply by O_DIRECT taking a reference to the page was that the O_DIRECT I/O completed (and thus the reference would be freed up) before the networking stack had actually finished with the page. If the O_DIRECT I/O did not complete until the page was actually finished with, we wouldn't see the problem in the first place. I may be completely off base here. -- Alex Bligh ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 10:12 ` Eric Dumazet 2013-07-04 12:57 ` Alex Bligh @ 2013-07-04 12:57 ` Alex Bligh 2013-07-04 21:32 ` David Miller 2013-07-04 21:32 ` David Miller 3 siblings, 0 replies; 64+ messages in thread From: Alex Bligh @ 2013-07-04 12:57 UTC (permalink / raw) To: Eric Dumazet, Ian Campbell Cc: Joe Jin, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Jan Beulich, Stefano Stabellini, Konrad Rzeszutek Wilk, Alex Bligh --On 4 July 2013 03:12:10 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > It looks like a typical COW issue to me. > > If the page content is written while there is still a reference on this > page, we should allocate a new page and copy the previous content. > > And this has little to do with networking. I suspect this would get more attention if we could make Ian's case below trigger (a) outside Xen, (b) outside networking. > memset(buf, 0xaa, 4096); > write(fd, buf, 4096) > memset(buf, 0x55, 4096); > (where fd is O_DIRECT on NFS) Can result in 0x55 being seen on the wire > in the TCP retransmit. We know this should fail using O_DIRECT+NFS. We've had reports suggesting it fails in O_DIRECT+iSCSI. However, that's been with a kernel panic (under Xen) rather than data corruption as per the above. Historical trawling suggests this is an issue with DRDB (see Ian's original thread from the mists of time). I don't quite understand why we aren't seeing corruption with standard ATA devices + O_DIRECT and no Xen involved at all. My memory is a bit misty on this but I had thought the reason why this would NOT be solved simply by O_DIRECT taking a reference to the page was that the O_DIRECT I/O completed (and thus the reference would be freed up) before the networking stack had actually finished with the page. If the O_DIRECT I/O did not complete until the page was actually finished with, we wouldn't see the problem in the first place. I may be completely off base here. -- Alex Bligh ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 10:12 ` Eric Dumazet 2013-07-04 12:57 ` Alex Bligh 2013-07-04 12:57 ` Alex Bligh @ 2013-07-04 21:32 ` David Miller 2013-07-04 21:32 ` David Miller 3 siblings, 0 replies; 64+ messages in thread From: David Miller @ 2013-07-04 21:32 UTC (permalink / raw) To: eric.dumazet Cc: frank.blaschka, zheng.x.li, Ian.Campbell, stefano.stabellini, netdev, joe.jin, linux-kernel, xen-devel, JBeulich, alex From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 04 Jul 2013 03:12:10 -0700 > It looks like a typical COW issue to me. Generically speaking, if we have to mess with page protections this eliminates the performance gain from bypass/zerocopy/whatever that these virtualization layers are doing. But there may be other factors involved which might mitigate that. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-04 10:12 ` Eric Dumazet ` (2 preceding siblings ...) 2013-07-04 21:32 ` David Miller @ 2013-07-04 21:32 ` David Miller 3 siblings, 0 replies; 64+ messages in thread From: David Miller @ 2013-07-04 21:32 UTC (permalink / raw) To: eric.dumazet Cc: Ian.Campbell, joe.jin, alex, frank.blaschka, linux-kernel, netdev, zheng.x.li, xen-devel, JBeulich, stefano.stabellini, konrad.wilk From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 04 Jul 2013 03:12:10 -0700 > It looks like a typical COW issue to me. Generically speaking, if we have to mess with page protections this eliminates the performance gain from bypass/zerocopy/whatever that these virtualization layers are doing. But there may be other factors involved which might mitigate that. ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-01 3:18 ` Joe Jin 2013-07-01 8:11 ` Ian Campbell 2013-07-01 8:11 ` Ian Campbell @ 2013-07-01 8:29 ` Alex Bligh 2013-07-01 8:29 ` Alex Bligh 3 siblings, 0 replies; 64+ messages in thread From: Alex Bligh @ 2013-07-01 8:29 UTC (permalink / raw) To: Joe Jin Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Eric Dumazet, Stefano Stabellini, Alex Bligh, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller Joe, > Do you know if have a fix for above? so far we also suspected the > grant page be unmapped earlier, we using 4.1 stable during our test. A true fix? No, but I posted a patch set (see later email message for a link) that you could forward port. The workaround is: >> A workaround is to turn off O_DIRECT use by Xen as that ensures >> the pages are copied. Xen 4.3 does this by default. >> >> I believe fixes for this are in 4.3 and 4.2.2 if using the >> qemu upstream DM. Note these aren't real fixes, just a workaround >> of a kernel bug. > > The guest is pvm, and disk model is xvbd, guest config file as below: ... > I think this only for pvhvm/hvm? I don't have much experience outside pvhvm/hvm, but I believe it should work for any device. Testing was simple - just find all (*) the references to O_DIRECT in your device model and remove them! (*)=you could be less lazy than me and find the right ones. I am guessing it will be the same ones though. -- Alex Bligh ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-07-01 3:18 ` Joe Jin ` (2 preceding siblings ...) 2013-07-01 8:29 ` Alex Bligh @ 2013-07-01 8:29 ` Alex Bligh 3 siblings, 0 replies; 64+ messages in thread From: Alex Bligh @ 2013-07-01 8:29 UTC (permalink / raw) To: Joe Jin Cc: Eric Dumazet, Frank Blaschka, David S. Miller, linux-kernel, netdev, zheng.x.li, Xen Devel, Ian Campbell, Jan Beulich, Stefano Stabellini, Alex Bligh Joe, > Do you know if have a fix for above? so far we also suspected the > grant page be unmapped earlier, we using 4.1 stable during our test. A true fix? No, but I posted a patch set (see later email message for a link) that you could forward port. The workaround is: >> A workaround is to turn off O_DIRECT use by Xen as that ensures >> the pages are copied. Xen 4.3 does this by default. >> >> I believe fixes for this are in 4.3 and 4.2.2 if using the >> qemu upstream DM. Note these aren't real fixes, just a workaround >> of a kernel bug. > > The guest is pvm, and disk model is xvbd, guest config file as below: ... > I think this only for pvhvm/hvm? I don't have much experience outside pvhvm/hvm, but I believe it should work for any device. Testing was simple - just find all (*) the references to O_DIRECT in your device model and remove them! (*)=you could be less lazy than me and find the right ones. I am guessing it will be the same ones though. -- Alex Bligh ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-30 9:13 ` Alex Bligh ` (3 preceding siblings ...) (?) @ 2013-07-01 3:18 ` Joe Jin -1 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-07-01 3:18 UTC (permalink / raw) To: Alex Bligh Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Eric Dumazet, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller On 06/30/13 17:13, Alex Bligh wrote: > > > --On 28 June 2013 12:17:43 +0800 Joe Jin <joe.jin@oracle.com> wrote: > >> Find a similar issue >> http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen >> developer as well. > > I thought this sounded familiar. I haven't got the start of this > thread, but what version of Xen are you running and what device > model? If before 4.3, there is a page lifetime bug in the kernel > (not the xen code) which can affect anything where the guest accesses > the host's block stack and that in turn accesses the networking > stack (it may in fact be wider than that). So, e.g. domU on > iCSSI will do it. It tends to get triggered by a TCP retransmit > or (on NFS) the RPC equivalent. Essentially block operation > is considered complete, returning through xen and freeing the > grant table entry, and yet something in the kernel (e.g. tcp > retransmit) can still access the data. The nature of the bug > is extensively discussed in that thread - you'll also find > a reference to a thread on linux-nfs which concludes it > isn't an nfs problem, and even some patches to fix it in the > kernel adding reference counting. Do you know if have a fix for above? so far we also suspected the grant page be unmapped earlier, we using 4.1 stable during our test. > > A workaround is to turn off O_DIRECT use by Xen as that ensures > the pages are copied. Xen 4.3 does this by default. > > I believe fixes for this are in 4.3 and 4.2.2 if using the > qemu upstream DM. Note these aren't real fixes, just a workaround > of a kernel bug. The guest is pvm, and disk model is xvbd, guest config file as below: vif = ['mac=00:21:f6:00:00:01,bridge=c0a80b00'] OVM_simple_name = 'Guest#1' disk = ['file:/OVS/Repositories/0004fb000003000091e9eae94d1e907c/VirtualDisks/0004fb0000120000f78799dad800ef47.img,xvda,w', 'phy:/dev/mapper/360060e8010141870058b415700000002,xvdb,w', 'phy:/dev/mapper/360060e8010141870058b415700000003,xvdc,w'] bootargs = '' uuid = '0004fb00-0006-0000-2b00-77a4766001ed' on_reboot = 'restart' cpu_weight = 27500 OVM_os_type = 'Oracle Linux 5' cpu_cap = 0 maxvcpus = 8 OVM_high_availability = False memory = 4096 OVM_description = '' on_poweroff = 'destroy' on_crash = 'restart' bootloader = '/usr/bin/pygrub' guest_os_type = 'linux' name = '0004fb00000600002b0077a4766001ed' vfb = ['type=vnc,vncunused=1,vnclisten=127.0.0.1,keymap=en-us'] vcpus = 8 OVM_cpu_compat_group = '' OVM_domain_type = 'xen_pvm' > > To fix on a local build of xen you will need something like this: > https://github.com/abligh/qemu-upstream-4.2-testing/commit/9a97c011e1a682eed9bc7195a25349eaf23ff3f9 > and something like this (NB: obviously insert your own git > repo and commit numbers) > https://github.com/abligh/xen/commit/f5c344afac96ced8b980b9659fb3e81c4a0db5ca > I think this only for pvhvm/hvm? Thanks, Joe > Also note those fixes are (technically) unsafe for live migration > unless there is an ordering change made in qemu's block open > call. > > Of course this might be something completely different. > ^ permalink raw reply [flat|nested] 64+ messages in thread
* Re: kernel panic in skb_copy_bits 2013-06-27 5:31 ` Eric Dumazet ` (2 preceding siblings ...) (?) @ 2013-06-28 4:17 ` Joe Jin -1 siblings, 0 replies; 64+ messages in thread From: Joe Jin @ 2013-06-28 4:17 UTC (permalink / raw) To: Eric Dumazet Cc: Frank Blaschka, zheng.x.li, Ian Campbell, Stefano Stabellini, netdev, linux-kernel, Xen Devel, Jan Beulich, David S. Miller Find a similar issue http://www.gossamer-threads.com/lists/xen/devel/265611 So copied to Xen developer as well. On 06/27/13 13:31, Eric Dumazet wrote: > On Thu, 2013-06-27 at 10:58 +0800, Joe Jin wrote: >> Hi, >> >> When we do fail over test with iscsi + multipath by reset the switches >> on OVM(2.6.39) we hit the panic: >> >> BUG: unable to handle kernel paging request at ffff88006d9e8d48 >> IP: [<ffffffff812605bb>] memcpy+0xb/0x120 >> PGD 1798067 PUD 1fd2067 PMD 213f067 PTE 0 >> Oops: 0000 [#1] SMP >> CPU 7 >> Modules linked in: dm_nfs tun nfs fscache auth_rpcgss nfs_acl xen_blkback xen_netback xen_gntdev xen_evtchn lockd sunrpc bridge stp llc bonding be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio dm_round_robin dm_multipath libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport ixgbe dca sr_mod cdrom bnx2 radeon ttm drm_kms_helper drm snd_seq_dummy i2c_algo_bit i2c_core snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss serio_raw snd_pcm snd_timer snd soundcore snd_page_alloc iTCO_wdt pcspkr iTCO_vendor_support pata_acpi dcdbas i5k_amb ata_generic hwmon floppy ghes i5000_edac edac_core h ed dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage lpfc scsi_transport_fc scsi_tgt ata_piix sg shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod crc_t10dif ext! 3! > j! >> bd mbcache >> >> >> Pid: 0, comm: swapper Tainted: G W 2.6.39-300.32.1.el5uek #1 Dell Inc. PowerEdge 2950/0DP246 >> RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP: e02b:ffff8801003c3d58 EFLAGS: 00010246 >> RAX: ffff880076b9e280 RBX: ffff8800714d2c00 RCX: 0000000000000057 >> RDX: 0000000000000000 RSI: ffff88006d9e8d48 RDI: ffff880076b9e280 >> RBP: ffff8801003c3dc0 R08: 00000000000bf723 R09: 0000000000000000 >> R10: 0000000000000000 R11: 000000000000000a R12: 0000000000000034 >> R13: 0000000000000034 R14: 00000000000002b8 R15: 00000000000005a8 >> FS: 00007fc1e852a6e0(0000) GS:ffff8801003c0000(0000) knlGS:0000000000000000 >> CS: e033 DS: 002b ES: 002b CR0: 000000008005003b >> CR2: ffff88006d9e8d48 CR3: 000000006370b000 CR4: 0000000000002660 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffff880077ac0000, task ffff880077abe240) >> Stack: >> ffffffff8142db21 0000000000000000 ffff880076b9e280 ffff8800637097f0 >> 000002ec00000000 00000000000002b8 ffff880077ac0000 0000000000000000 >> ffff8800637097f0 ffff880066c9a7c0 00000000fffffdb4 000000000000024c >> Call Trace: >> <IRQ> >> [<ffffffff8142db21>] ? skb_copy_bits+0x1c1/0x2e0 >> [<ffffffff8142f173>] skb_copy+0xf3/0x120 >> [<ffffffff81447fbc>] neigh_timer_handler+0x1ac/0x350 >> [<ffffffff810573fe>] ? account_idle_ticks+0xe/0x10 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 >> [<ffffffff81447e10>] ? neigh_alloc+0x180/0x180 >> [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 >> [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 >> [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 >> [<ffffffff81511d3c>] call_softirq+0x1c/0x30 >> [<ffffffff810172e5>] do_softirq+0x65/0xa0 >> [<ffffffff8107656b>] irq_exit+0xab/0xc0 >> [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 >> [<ffffffff81511d8e>] xen_do_hypervisor_callback+0x1e/0x30 >> <EOI> >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 >> [<ffffffff8100a0b0>] ? xen_safe_halt+0x10/0x20 >> [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 >> [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 >> [<ffffffff8100a8c9>] ? xen_irq_enable_direct_reloc+0x4/0x4 >> [<ffffffff814f7bbe>] ? cpu_bringup_and_idle+0xe/0x10 >> Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c >> RIP [<ffffffff812605bb>] memcpy+0xb/0x120 >> RSP <ffff8801003c3d58> >> CR2: ffff88006d9e8d48 >> >> Reviewed vmcore I found the skb->users is 1 at the moment, checked network neighbour >> history I found skb_get() be replaced by skb_copy by commit 7e36763b2c: >> >> commit 7e36763b2c204d59de4e88087f84a2c0c8421f25 >> Author: Frank Blaschka <frank.blaschka@de.ibm.com> >> Date: Mon Mar 3 12:16:04 2008 -0800 >> >> [NET]: Fix race in generic address resolution. >> >> neigh_update sends skb from neigh->arp_queue while neigh_timer_handler >> has increased skbs refcount and calls solicit with the >> skb. neigh_timer_handler should not increase skbs refcount but make a >> copy of the skb and do solicit with the copy. >> >> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> >> Signed-off-by: David S. Miller <davem@davemloft.net> >> >> So can you please give some details of the race? per vmcore seems like the skb data >> be freed, I suspected skb_get() lost at somewhere? >> I reverted above commit the panic not occurred during our testing. >> >> Any input will appreciate! > > Well, fact is that your crash is happening in skb_copy(). > > Frank patch is OK. I suspect using skb_clone() would work too, > so if these skb were fclone ready, chance of an GFP_ATOMIC allocation > error would be smaller. > > So something is providing a wrong skb at the very beginning. > > You could try to do a early skb_copy to catch the bug and see in the > stack trace what produced this buggy skb. > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 5c56b21..a7a51fd 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -1010,6 +1010,7 @@ int __neigh_event_send(struct neighbour *neigh, struct sk_buff *skb) > NEIGH_CACHE_STAT_INC(neigh->tbl, unres_discards); > } > skb_dst_force(skb); > + kfree_skb(skb_copy(skb, GFP_ATOMIC)); > __skb_queue_tail(&neigh->arp_queue, skb); > neigh->arp_queue_len_bytes += skb->truesize; > } > > BUG: unable to handle kernel paging request at ffff8800488db8dc IP: [<ffffffff812605bb>] memcpy+0xb/0x120 PGD 1796067 PUD 20e5067 PMD 212a067 PTE 0 Oops: 0000 [#1] SMP CPU 13 Modules linked in: ocfs2 jbd2 xen_blkback xen_netback xen_gntdev xen_evtchn netconsole i2c_dev i2c_core ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs lockd sunrpc dm_round_robin dm_multipath bridge stp llc bonding be2iscsi iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio ipv6 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi xenfs xen_privcmd video sbs sbshc hed acpi_memhotplug acpi_ipmi ipmi_msghandler parport_pc lp parport serio_raw ixgbe hpilo tg3 hpwdt dca snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd iTCO_wdt iTCO_vendor_support soundcore snd_page_alloc pcspkr pata_acpi ata_generic dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage ata_piix sg shpc hp hpsa cciss sd_mod crc_t10dif ext3 jbd mbcache Pid: 0, comm: swapper Not tainted 2.6.39-300.32.1.el5uek.bug16929255v5 #1 HP ProLiant DL360p Gen8 RIP: e030:[<ffffffff812605bb>] [<ffffffff812605bb>] memcpy+0xb/0x120 RSP: e02b:ffff88005a9a3b68 EFLAGS: 00010202 RAX: ffff8800200f0280 RBX: 0000000000000724 RCX: 00000000000000e4 RDX: 0000000000000004 RSI: ffff8800488db8dc RDI: ffff8800200f0280 RBP: ffff88005a9a3bd0 R08: 0000000000000004 R09: ffff880052824980 R10: 0000000000000000 R11: 0000000000015048 R12: 0000000000000034 R13: 0000000000000034 R14: 00000000000022f4 R15: ffff880021208ab0 FS: 00007fe8737c96e0(0000) GS:ffff88005a9a0000(0000) knlGS:0000000000000000 CS: e033 DS: 002b ES: 002b CR0: 000000008005003b CR2: ffff8800488db8dc CR3: 000000004fb38000 CR4: 0000000000002660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff880054d36000, task ffff880054d343c0) Stack: ffffffff8142dac7 0000000000000000 00000000ffffffff ffff8800200f0280 0000075800000000 0000000000000724 ffff880054d36000 0000000000000000 00000000fffffdb4 ffff880052824980 ffff880021208ab0 000000000000024c Call Trace: <IRQ> [<ffffffff8142dac7>] ? skb_copy_bits+0x167/0x290 [<ffffffff8142f0b5>] skb_copy+0x85/0xb0 [<ffffffff8144864d>] __neigh_event_send+0x18d/0x200 [<ffffffff81449a42>] neigh_resolve_output+0x162/0x1b0 [<ffffffff81477046>] ip_finish_output+0x146/0x320 [<ffffffff814754a5>] ip_output+0x85/0xd0 [<ffffffff814758d9>] ip_local_out+0x29/0x30 [<ffffffff814761e0>] ip_queue_xmit+0x1c0/0x3d0 [<ffffffff8148d3ef>] tcp_transmit_skb+0x40f/0x520 [<ffffffff8148e5ff>] tcp_retransmit_skb+0x16f/0x2e0 [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 [<ffffffff814905ad>] tcp_retransmit_timer+0x18d/0x4a0 [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 [<ffffffff81490994>] tcp_write_timer+0xd4/0x100 [<ffffffff8107dbaa>] call_timer_fn+0x4a/0x110 [<ffffffff814908c0>] ? tcp_retransmit_timer+0x4a0/0x4a0 [<ffffffff8107f82a>] run_timer_softirq+0x13a/0x220 [<ffffffff81075c39>] __do_softirq+0xb9/0x1d0 [<ffffffff810d9678>] ? handle_percpu_irq+0x48/0x70 [<ffffffff81511b7c>] call_softirq+0x1c/0x30 [<ffffffff810172e5>] do_softirq+0x65/0xa0 [<ffffffff8107656b>] irq_exit+0xab/0xc0 [<ffffffff812f97d5>] xen_evtchn_do_upcall+0x35/0x50 [<ffffffff81511bce>] xen_do_hypervisor_callback+0x1e/0x30 <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20 [<ffffffff8100a0d0>] ? xen_safe_halt+0x10/0x20 [<ffffffff8101dfeb>] ? default_idle+0x5b/0x170 [<ffffffff81014ac6>] ? cpu_idle+0xc6/0xf0 [<ffffffff8100a8e9>] ? xen_irq_enable_direct_reloc+0x4/0x4 [<ffffffff814f7a2e>] ? cpu_bringup_and_idle+0xe/0x10 Code: 01 c6 43 4c 04 19 c0 4c 8b 65 f0 4c 8b 6d f8 83 e0 fc 83 c0 08 88 43 4d 48 8b 5d e8 c9 c3 90 90 48 89 f8 89 d1 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c RIP [<ffffffff812605bb>] memcpy+0xb/0x120 Per vmcore, the socket info as below: ------------------------------------------------------------------------------ <struct tcp_sock 0xffff88004d344e00> TCP tcp 10.1.1.11:42147 10.1.1.21:3260 FIN_WAIT1 windows: rcv=122124, snd=65535 advmss=8948 rcv_ws=1 snd_ws=0 nonagle=1 sack_ok=0 tstamp_ok=1 rmem_alloc=0, wmem_alloc=10229 rx_queue=0, tx_queue=149765 rcvbuf=262142, sndbuf=262142 rcv_tstamp=51.4 s, lsndtime=0.0 s ago -- Retransmissions -- retransmits=7, ca_state=TCP_CA_Disorder ------------------------------------------------------------------------------ When sock status move to FIN_WAIT1, will it cleanup all skb or no? Thanks, Joe ^ permalink raw reply [flat|nested] 64+ messages in thread
end of thread, other threads:[~2013-07-04 21:32 UTC | newest] Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-06-27 2:58 kernel panic in skb_copy_bits Joe Jin 2013-06-27 2:58 ` Joe Jin 2013-06-27 5:31 ` Eric Dumazet 2013-06-27 5:31 ` Eric Dumazet 2013-06-27 7:15 ` Joe Jin 2013-06-27 7:15 ` Joe Jin 2013-06-28 4:17 ` Joe Jin 2013-06-28 4:17 ` Joe Jin 2013-06-28 6:52 ` Eric Dumazet 2013-06-28 6:52 ` Eric Dumazet 2013-06-28 9:37 ` Eric Dumazet 2013-06-28 9:37 ` Eric Dumazet 2013-06-28 11:33 ` Joe Jin 2013-06-28 11:33 ` Joe Jin 2013-06-28 23:36 ` Joe Jin 2013-06-28 23:36 ` Joe Jin 2013-06-29 7:04 ` Eric Dumazet 2013-06-29 7:04 ` Eric Dumazet 2013-06-29 7:20 ` Eric Dumazet 2013-06-29 7:20 ` Eric Dumazet 2013-06-29 7:20 ` Eric Dumazet 2013-06-29 16:11 ` Ben Greear 2013-06-29 16:11 ` Ben Greear 2013-06-29 16:11 ` Ben Greear 2013-06-29 16:26 ` Eric Dumazet 2013-06-29 16:31 ` Ben Greear 2013-06-29 16:31 ` Ben Greear 2013-06-29 16:26 ` Eric Dumazet 2013-06-30 0:26 ` Joe Jin 2013-06-30 0:26 ` Joe Jin 2013-06-30 7:50 ` Eric Dumazet 2013-06-30 7:50 ` Eric Dumazet 2013-06-30 0:26 ` Joe Jin 2013-06-28 23:36 ` Joe Jin 2013-07-01 20:36 ` David Miller 2013-07-01 20:36 ` David Miller 2013-06-28 6:52 ` Eric Dumazet 2013-06-30 9:13 ` Alex Bligh 2013-06-30 9:13 ` Alex Bligh 2013-06-30 9:35 ` Alex Bligh 2013-06-30 9:35 ` Alex Bligh 2013-07-01 3:18 ` Joe Jin 2013-07-01 8:11 ` Ian Campbell 2013-07-01 8:11 ` Ian Campbell 2013-07-01 13:00 ` Joe Jin 2013-07-01 13:00 ` Joe Jin 2013-07-04 8:55 ` Joe Jin 2013-07-04 8:55 ` Joe Jin 2013-07-04 8:59 ` Ian Campbell 2013-07-04 8:59 ` Ian Campbell 2013-07-04 9:34 ` Eric Dumazet 2013-07-04 9:34 ` Eric Dumazet 2013-07-04 9:52 ` Ian Campbell 2013-07-04 9:52 ` Ian Campbell 2013-07-04 10:12 ` Eric Dumazet 2013-07-04 10:12 ` Eric Dumazet 2013-07-04 12:57 ` Alex Bligh 2013-07-04 12:57 ` Alex Bligh 2013-07-04 21:32 ` David Miller 2013-07-04 21:32 ` David Miller 2013-07-01 8:29 ` Alex Bligh 2013-07-01 8:29 ` Alex Bligh 2013-07-01 3:18 ` Joe Jin 2013-06-28 4:17 ` Joe Jin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.