All of lore.kernel.org
 help / color / mirror / Atom feed
* ipv6 fragmentation-related panic in netfilter
@ 2013-10-29 21:07 Tomas Hlavacek
  2013-10-30  0:07 ` Patrick McHardy
  0 siblings, 1 reply; 8+ messages in thread
From: Tomas Hlavacek @ 2013-10-29 21:07 UTC (permalink / raw)
  To: netdev; +Cc: netfilter-devel

Hi!

I have encountered following condition on 3 distinct hosts in last few 
days. Hosts are failing several times a day (4 to 7 times) and it usually 
happens roughly at the same time. Affected hosts has almost exactly the 
same HW, but different kernel versions from Debian (Wheezy) default 3.2 up 
to 3.11.6.


      KERNEL: /usr/src/vmlinux                
    DUMPFILE: dump.201310291545  [PARTIAL DUMP]
        CPUS: 16
        DATE: Tue Oct 29 15:45:11 2013
      UPTIME: 06:04:17
LOAD AVERAGE: 0.04, 0.25, 0.32
       TASKS: 211
    NODENAME: fw03a
     RELEASE: 3.11.6
     VERSION: #2 SMP Mon Oct 28 20:29:03 CET 2013
     MACHINE: x86_64  (2393 Mhz)
      MEMORY: 12 GB
       PANIC: 
         PID: 0
     COMMAND: "swapper/1"
        TASK: ffff8801b90ac7b0  (1 of 16)  [THREAD_INFO: ffff8801b90b4000]
         CPU: 1
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 0      TASK: ffff8801b90ac7b0  CPU: 1   COMMAND: "swapper/1"
 #0 [ffff8801bfc235d0] machine_kexec at ffffffff81032f68
 #1 [ffff8801bfc23610] crash_kexec at ffffffff8109e055
 #2 [ffff8801bfc236e0] oops_end at ffffffff81005e90
 #3 [ffff8801bfc23700] do_invalid_op at ffffffff81003004
 #4 [ffff8801bfc237a0] invalid_op at ffffffff8142b368
    [exception RIP: pskb_expand_head+596]
    RIP: ffffffff81333c74  RSP: ffff8801bfc23850  RFLAGS: 00010202
    RAX: 0000000000000003  RBX: ffff8801b6d99080  RCX: 0000000000000020
    RDX: 00000000000005f4  RSI: 0000000000000000  RDI: ffff8801b6d99080
    RBP: 0000000040115833   R8: 00000000000002c0   R9: ffff8801b8cf2c00
    R10: 000000000000ffff  R11: 00000000197033fe  R12: 0000000000000000
    R13: ffff880337b59a00  R14: ffffffffa03fb160  R15: ffff880337b59a00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #5 [ffff8801bfc23858] __nf_conntrack_confirm at ffffffffa03ace16 
[nf_conntrack]
 #6 [ffff8801bfc238c8] vlan_netlink_fini at ffffffffa03fb160 [8021q]
 #7 [ffff8801bfc23928] dev_queue_xmit at ffffffff81342d79
 #8 [ffff8801bfc23978] ip6_finish_output2 at ffffffff813d26ee
 #9 [ffff8801bfc239c8] ip6_forward at ffffffff813d44be
#10 [ffff8801bfc23a48] __ipv6_conntrack_in at ffffffffa034f7b6 
[nf_conntrack_ipv6]
#11 [ffff8801bfc23a98] nf_iterate at ffffffff8136ba0d
#12 [ffff8801bfc23af8] nf_hook_slow at ffffffff8136baae
#13 [ffff8801bfc23b68] nf_ct_frag6_output at ffffffffa039decf 
[nf_defrag_ipv6]
#14 [ffff8801bfc23bd8] ipv6_defrag at ffffffffa039d0c1 [nf_defrag_ipv6]
#15 [ffff8801bfc23c18] nf_iterate at ffffffff8136ba0d
#16 [ffff8801bfc23c78] nf_hook_slow at ffffffff8136baae
#17 [ffff8801bfc23ce8] ipv6_rcv at ffffffff813d59f5
#18 [ffff8801bfc23d38] __netif_receive_skb_core at ffffffff813410db
#19 [ffff8801bfc23db8] napi_gro_receive at ffffffff81341d88
#20 [ffff8801bfc23dd8] igb_poll at ffffffffa0035867 [igb]
#21 [ffff8801bfc23e88] net_rx_action at ffffffff81341ac9
#22 [ffff8801bfc23ed8] __do_softirq at ffffffff81049fb6
#23 [ffff8801bfc23f38] call_softirq at ffffffff8142b4fc
#24 [ffff8801bfc23f50] do_softirq at ffffffff8100481d
#25 [ffff8801bfc23f80] do_IRQ at ffffffff810043bb
--- <IRQ stack> ---
#26 [ffff8801b90b5db8] ret_from_intr at ffffffff81429baa
    [exception RIP: cpuidle_enter_state+86]
    RIP: ffffffff813107a6  RSP: ffff8801b90b5e68  RFLAGS: 00000216
    RAX: 000000000007ff2b  RBX: 0000000140523c4c  RCX: 0000000000000018
    RDX: 0000000225c17d03  RSI: 0000000000000000  RDI: ffffffff81812600
    RBP: 0000000000000004   R8: 0000000000000018   R9: 00000000000006cf
    R10: 0000000000000001  R11: 0000000000000006  R12: 0000000100523c4e
    R13: 0000000000000000  R14: ffffffff81066415  R15: 0000000000000086
    ORIG_RAX: ffffffffffffff94  CS: 0010  SS: 0018
#27 [ffff8801b90b5eb0] cpuidle_idle_call at ffffffff813108ce
#28 [ffff8801b90b5ee0] arch_cpu_idle at ffffffff8100b769
#29 [ffff8801b90b5ef0] cpu_startup_entry at ffffffff81086b1d
#30 [ffff8801b90b5f30] start_secondary at ffffffff8102af40

I am investigating at the moment. All suggestions/help would be 
appreciated.

Tomas

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ipv6 fragmentation-related panic in netfilter
  2013-10-29 21:07 ipv6 fragmentation-related panic in netfilter Tomas Hlavacek
@ 2013-10-30  0:07 ` Patrick McHardy
  2013-11-01  8:45   ` Steffen Klassert
  0 siblings, 1 reply; 8+ messages in thread
From: Patrick McHardy @ 2013-10-30  0:07 UTC (permalink / raw)
  To: Tomas Hlavacek; +Cc: netdev, netfilter-devel

On Tue, Oct 29, 2013 at 10:07:59PM +0100, Tomas Hlavacek wrote:
> Hi!
> 
> I have encountered following condition on 3 distinct hosts in last
> few days. Hosts are failing several times a day (4 to 7 times) and
> it usually happens roughly at the same time. Affected hosts has
> almost exactly the same HW, but different kernel versions from
> Debian (Wheezy) default 3.2 up to 3.11.6.
> 
> 
>      KERNEL: /usr/src/vmlinux                   DUMPFILE:
> dump.201310291545  [PARTIAL DUMP]
>        CPUS: 16
>        DATE: Tue Oct 29 15:45:11 2013
>      UPTIME: 06:04:17
> LOAD AVERAGE: 0.04, 0.25, 0.32
>       TASKS: 211
>    NODENAME: fw03a
>     RELEASE: 3.11.6
>     VERSION: #2 SMP Mon Oct 28 20:29:03 CET 2013
>     MACHINE: x86_64  (2393 Mhz)
>      MEMORY: 12 GB
>       PANIC:         PID: 0
>     COMMAND: "swapper/1"
>        TASK: ffff8801b90ac7b0  (1 of 16)  [THREAD_INFO: ffff8801b90b4000]
>         CPU: 1
>       STATE: TASK_RUNNING (PANIC)
> 
> crash> bt
> PID: 0      TASK: ffff8801b90ac7b0  CPU: 1   COMMAND: "swapper/1"
> #0 [ffff8801bfc235d0] machine_kexec at ffffffff81032f68
> #1 [ffff8801bfc23610] crash_kexec at ffffffff8109e055
> #2 [ffff8801bfc236e0] oops_end at ffffffff81005e90
> #3 [ffff8801bfc23700] do_invalid_op at ffffffff81003004
> #4 [ffff8801bfc237a0] invalid_op at ffffffff8142b368
>    [exception RIP: pskb_expand_head+596]
>    RIP: ffffffff81333c74  RSP: ffff8801bfc23850  RFLAGS: 00010202
>    RAX: 0000000000000003  RBX: ffff8801b6d99080  RCX: 0000000000000020
>    RDX: 00000000000005f4  RSI: 0000000000000000  RDI: ffff8801b6d99080
>    RBP: 0000000040115833   R8: 00000000000002c0   R9: ffff8801b8cf2c00
>    R10: 000000000000ffff  R11: 00000000197033fe  R12: 0000000000000000
>    R13: ffff880337b59a00  R14: ffffffffa03fb160  R15: ffff880337b59a00
>    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
> #5 [ffff8801bfc23858] __nf_conntrack_confirm at ffffffffa03ace16
> [nf_conntrack]
> #6 [ffff8801bfc238c8] vlan_netlink_fini at ffffffffa03fb160 [8021q]
> #7 [ffff8801bfc23928] dev_queue_xmit at ffffffff81342d79
> #8 [ffff8801bfc23978] ip6_finish_output2 at ffffffff813d26ee
> #9 [ffff8801bfc239c8] ip6_forward at ffffffff813d44be
> #10 [ffff8801bfc23a48] __ipv6_conntrack_in at ffffffffa034f7b6
> [nf_conntrack_ipv6]
> #11 [ffff8801bfc23a98] nf_iterate at ffffffff8136ba0d
> #12 [ffff8801bfc23af8] nf_hook_slow at ffffffff8136baae
> #13 [ffff8801bfc23b68] nf_ct_frag6_output at ffffffffa039decf
> [nf_defrag_ipv6]
> #14 [ffff8801bfc23bd8] ipv6_defrag at ffffffffa039d0c1 [nf_defrag_ipv6]
> #15 [ffff8801bfc23c18] nf_iterate at ffffffff8136ba0d
> #16 [ffff8801bfc23c78] nf_hook_slow at ffffffff8136baae
> #17 [ffff8801bfc23ce8] ipv6_rcv at ffffffff813d59f5
> #18 [ffff8801bfc23d38] __netif_receive_skb_core at ffffffff813410db
> #19 [ffff8801bfc23db8] napi_gro_receive at ffffffff81341d88
> #20 [ffff8801bfc23dd8] igb_poll at ffffffffa0035867 [igb]
> #21 [ffff8801bfc23e88] net_rx_action at ffffffff81341ac9
> #22 [ffff8801bfc23ed8] __do_softirq at ffffffff81049fb6
> #23 [ffff8801bfc23f38] call_softirq at ffffffff8142b4fc
> #24 [ffff8801bfc23f50] do_softirq at ffffffff8100481d
> #25 [ffff8801bfc23f80] do_IRQ at ffffffff810043bb
> --- <IRQ stack> ---
> #26 [ffff8801b90b5db8] ret_from_intr at ffffffff81429baa
>    [exception RIP: cpuidle_enter_state+86]
>    RIP: ffffffff813107a6  RSP: ffff8801b90b5e68  RFLAGS: 00000216
>    RAX: 000000000007ff2b  RBX: 0000000140523c4c  RCX: 0000000000000018
>    RDX: 0000000225c17d03  RSI: 0000000000000000  RDI: ffffffff81812600
>    RBP: 0000000000000004   R8: 0000000000000018   R9: 00000000000006cf
>    R10: 0000000000000001  R11: 0000000000000006  R12: 0000000100523c4e
>    R13: 0000000000000000  R14: ffffffff81066415  R15: 0000000000000086
>    ORIG_RAX: ffffffffffffff94  CS: 0010  SS: 0018
> #27 [ffff8801b90b5eb0] cpuidle_idle_call at ffffffff813108ce
> #28 [ffff8801b90b5ee0] arch_cpu_idle at ffffffff8100b769
> #29 [ffff8801b90b5ef0] cpu_startup_entry at ffffffff81086b1d
> #30 [ffff8801b90b5f30] start_secondary at ffffffff8102af40
> 
> I am investigating at the moment. All suggestions/help would be
> appreciated.

The problem is that the reassembled packet is referenced by the individual
fragments, so we trigger the BUG_ON in pskb_expand_head(). In this
particular case the case we BUG() on is actually OK, but I'm looking at
a way we can fix this without special casing. Hope to have a patch for
testing in the next hours.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ipv6 fragmentation-related panic in netfilter
  2013-10-30  0:07 ` Patrick McHardy
@ 2013-11-01  8:45   ` Steffen Klassert
  2013-11-01  9:25     ` Patrick McHardy
  0 siblings, 1 reply; 8+ messages in thread
From: Steffen Klassert @ 2013-11-01  8:45 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Tomas Hlavacek, netdev, netfilter-devel

On Wed, Oct 30, 2013 at 12:07:11AM +0000, Patrick McHardy wrote:
> 
> The problem is that the reassembled packet is referenced by the individual
> fragments, so we trigger the BUG_ON in pskb_expand_head(). In this
> particular case the case we BUG() on is actually OK, but I'm looking at
> a way we can fix this without special casing. Hope to have a patch for
> testing in the next hours.

Just for the record. I'm observing similar, quite reproducable crashes when
receiving fragmented icmp echo request packets on an IPsec gateway with
nf_conntrack_ipv6.

Since git commit 58a317f10 ("netfilter: ipv6: add IPv6 NAT support")
netfilter might insert a reassembled ipv6 packet with a shared skb and
local_df = 1 to the ok function. In case of xfrm, __xfrm6_output()
fragments the packet again and when adjusting the headroom later, we
crash because of a shared skb.

I can fix it by checking for a shared skb in ip6_fragment() and do
slow path fragmentation then. But we never needed such a check in
ip6_fragment(), so it's maybe better to fix it in netfilter.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ipv6 fragmentation-related panic in netfilter
  2013-11-01  8:45   ` Steffen Klassert
@ 2013-11-01  9:25     ` Patrick McHardy
  2013-11-19 11:11       ` Wolfgang Walter
  0 siblings, 1 reply; 8+ messages in thread
From: Patrick McHardy @ 2013-11-01  9:25 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Tomas Hlavacek, netdev, netfilter-devel

On Fri, Nov 01, 2013 at 09:45:29AM +0100, Steffen Klassert wrote:
> On Wed, Oct 30, 2013 at 12:07:11AM +0000, Patrick McHardy wrote:
> > 
> > The problem is that the reassembled packet is referenced by the individual
> > fragments, so we trigger the BUG_ON in pskb_expand_head(). In this
> > particular case the case we BUG() on is actually OK, but I'm looking at
> > a way we can fix this without special casing. Hope to have a patch for
> > testing in the next hours.
> 
> Just for the record. I'm observing similar, quite reproducable crashes when
> receiving fragmented icmp echo request packets on an IPsec gateway with
> nf_conntrack_ipv6.
> 
> Since git commit 58a317f10 ("netfilter: ipv6: add IPv6 NAT support")
> netfilter might insert a reassembled ipv6 packet with a shared skb and
> local_df = 1 to the ok function. In case of xfrm, __xfrm6_output()
> fragments the packet again and when adjusting the headroom later, we
> crash because of a shared skb.
> 
> I can fix it by checking for a shared skb in ip6_fragment() and do
> slow path fragmentation then. But we never needed such a check in
> ip6_fragment(), so it's maybe better to fix it in netfilter.

So what seems to be happening is that this case in __ipv6_conntrack_in()
triggers:

        /* Conntrack helpers need the entire reassembled packet in the
         * POST_ROUTING hook. In case of unconfirmed connections NAT
         * might reassign a helper, so the entire packet is also
         * required.
         */
        ct = nf_ct_get(reasm, &ctinfo);
        if (ct != NULL && !nf_ct_is_untracked(ct)) {
                help = nfct_help(ct);
                if ((help && help->helper) || !nf_ct_is_confirmed(ct)) {
                        nf_conntrack_get_reasm(reasm);
                        NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
                                       (struct net_device *)in,
                                       (struct net_device *)out,        
                                       okfn, NF_IP6_PRI_CONNTRACK + 1);

Since this code is called while walking through the fragment chain, we have
extra references to the reassembled skb. So I think what we need to do is
to release the fragment chain before calling NF_HOOK_THRESH() and indicate
this to nf_ct_frag6_output() so it will stop processing the chain immediately.

I'll give it a try, will let you know when I have a patch for testing.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ipv6 fragmentation-related panic in netfilter
  2013-11-01  9:25     ` Patrick McHardy
@ 2013-11-19 11:11       ` Wolfgang Walter
  2013-11-19 12:40         ` Hannes Frederic Sowa
  0 siblings, 1 reply; 8+ messages in thread
From: Wolfgang Walter @ 2013-11-19 11:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Steffen Klassert, Tomas Hlavacek, netdev, netfilter-devel

Am Freitag, 1. November 2013, 09:25:37 schrieb Patrick McHardy:
> On Fri, Nov 01, 2013 at 09:45:29AM +0100, Steffen Klassert wrote:
> > On Wed, Oct 30, 2013 at 12:07:11AM +0000, Patrick McHardy wrote:
> > > The problem is that the reassembled packet is referenced by the
> > > individual
> > > fragments, so we trigger the BUG_ON in pskb_expand_head(). In this
> > > particular case the case we BUG() on is actually OK, but I'm looking at
> > > a way we can fix this without special casing. Hope to have a patch for
> > > testing in the next hours.
> > 
> > Just for the record. I'm observing similar, quite reproducable crashes
> > when
> > receiving fragmented icmp echo request packets on an IPsec gateway with
> > nf_conntrack_ipv6.
> > 
> > Since git commit 58a317f10 ("netfilter: ipv6: add IPv6 NAT support")
> > netfilter might insert a reassembled ipv6 packet with a shared skb and
> > local_df = 1 to the ok function. In case of xfrm, __xfrm6_output()
> > fragments the packet again and when adjusting the headroom later, we
> > crash because of a shared skb.
> > 
> > I can fix it by checking for a shared skb in ip6_fragment() and do
> > slow path fragmentation then. But we never needed such a check in
> > ip6_fragment(), so it's maybe better to fix it in netfilter.
> 
> So what seems to be happening is that this case in __ipv6_conntrack_in()
> triggers:
> 
>         /* Conntrack helpers need the entire reassembled packet in the
>          * POST_ROUTING hook. In case of unconfirmed connections NAT
>          * might reassign a helper, so the entire packet is also
>          * required.
>          */
>         ct = nf_ct_get(reasm, &ctinfo);
>         if (ct != NULL && !nf_ct_is_untracked(ct)) {
>                 help = nfct_help(ct);
>                 if ((help && help->helper) || !nf_ct_is_confirmed(ct)) {
>                         nf_conntrack_get_reasm(reasm);
>                         NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
>                                        (struct net_device *)in,
>                                        (struct net_device *)out,
>                                        okfn, NF_IP6_PRI_CONNTRACK + 1);
> 
> Since this code is called while walking through the fragment chain, we have
> extra references to the reassembled skb. So I think what we need to do is
> to release the fragment chain before calling NF_HOOK_THRESH() and indicate
> this to nf_ct_frag6_output() so it will stop processing the chain
> immediately.
> 
> I'll give it a try, will let you know when I have a patch for testing.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Are there patches available? I can crash my 3.12 kernel easily doing

	fping -p 20 -l -b 4000 bla

3.11.x does not expose this problem.

Regards,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ipv6 fragmentation-related panic in netfilter
  2013-11-19 11:11       ` Wolfgang Walter
@ 2013-11-19 12:40         ` Hannes Frederic Sowa
  2013-11-19 22:27           ` Wolfgang Walter
  0 siblings, 1 reply; 8+ messages in thread
From: Hannes Frederic Sowa @ 2013-11-19 12:40 UTC (permalink / raw)
  To: Wolfgang Walter
  Cc: Patrick McHardy, Steffen Klassert, Tomas Hlavacek, netdev,
	netfilter-devel

On Tue, Nov 19, 2013 at 12:11:24PM +0100, Wolfgang Walter wrote:
> Are there patches available? I can crash my 3.12 kernel easily doing
> 
> 	fping -p 20 -l -b 4000 bla
> 
> 3.11.x does not expose this problem.

Yes, see here:

http://patchwork.ozlabs.org/patch/288967/
http://patchwork.ozlabs.org/patch/288970/

Greetings,

  Hannes

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: ipv6 fragmentation-related panic in netfilter
  2013-11-19 12:40         ` Hannes Frederic Sowa
@ 2013-11-19 22:27           ` Wolfgang Walter
  2013-11-20 20:43             ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Wolfgang Walter @ 2013-11-19 22:27 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Patrick McHardy, Steffen Klassert, Tomas Hlavacek, netdev,
	netfilter-devel, David Miller

Am Dienstag, 19. November 2013, 13:40:32 schrieb Hannes Frederic Sowa:
> On Tue, Nov 19, 2013 at 12:11:24PM +0100, Wolfgang Walter wrote:
> > Are there patches available? I can crash my 3.12 kernel easily doing
> > 
> > 	fping -p 20 -l -b 4000 bla
> > 
> > 3.11.x does not expose this problem.
> 
> Yes, see here:
> 
> http://patchwork.ozlabs.org/patch/288967/
> http://patchwork.ozlabs.org/patch/288970/
> 

This fixed it. The second patch did not cleanly apply to 3.12 due to
formatting changes or because some functions in 3.12 get

	unsigned int hooknum

instead of

	const struct nf_hook_ops *ops

I hopefully got it right:at least it works and no crashes any more :-).

I hope both patches go into stable soon.

Thanks,
-- 
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts


Here is the modified version of the second patch if someone needs it (but no guarantee that it is correct):


diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index c2d8933..f66f346 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -333,11 +333,6 @@ typedef unsigned int sk_buff_data_t;
 typedef unsigned char *sk_buff_data_t;
 #endif
 
-#if defined(CONFIG_NF_DEFRAG_IPV4) || defined(CONFIG_NF_DEFRAG_IPV4_MODULE) || \
-    defined(CONFIG_NF_DEFRAG_IPV6) || defined(CONFIG_NF_DEFRAG_IPV6_MODULE)
-#define NET_SKBUFF_NF_DEFRAG_NEEDED 1
-#endif
-
 /** 
  *	struct sk_buff - socket buffer
  *	@next: Next buffer in list
@@ -370,7 +365,6 @@ typedef unsigned char *sk_buff_data_t;
  *	@protocol: Packet protocol from driver
  *	@destructor: Destruct function
  *	@nfct: Associated connection, if any
- *	@nfct_reasm: netfilter conntrack re-assembly pointer
  *	@nf_bridge: Saved data about a bridged frame - see br_netfilter.c
  *	@skb_iif: ifindex of device we arrived on
  *	@tc_index: Traffic control index
@@ -459,9 +453,6 @@ struct sk_buff {
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	struct nf_conntrack	*nfct;
 #endif
-#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
-	struct sk_buff		*nfct_reasm;
-#endif
 #ifdef CONFIG_BRIDGE_NETFILTER
 	struct nf_bridge_info	*nf_bridge;
 #endif
@@ -2605,18 +2596,6 @@ static inline void nf_conntrack_get(struct nf_conntrack *nfct)
 		atomic_inc(&nfct->use);
 }
 #endif
-#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
-static inline void nf_conntrack_get_reasm(struct sk_buff *skb)
-{
-	if (skb)
-		atomic_inc(&skb->users);
-}
-static inline void nf_conntrack_put_reasm(struct sk_buff *skb)
-{
-	if (skb)
-		kfree_skb(skb);
-}
-#endif
 #ifdef CONFIG_BRIDGE_NETFILTER
 static inline void nf_bridge_put(struct nf_bridge_info *nf_bridge)
 {
@@ -2635,10 +2614,6 @@ static inline void nf_reset(struct sk_buff *skb)
 	nf_conntrack_put(skb->nfct);
 	skb->nfct = NULL;
 #endif
-#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
-	nf_conntrack_put_reasm(skb->nfct_reasm);
-	skb->nfct_reasm = NULL;
-#endif
 #ifdef CONFIG_BRIDGE_NETFILTER
 	nf_bridge_put(skb->nf_bridge);
 	skb->nf_bridge = NULL;
@@ -2660,10 +2635,6 @@ static inline void __nf_copy(struct sk_buff *dst, const struct sk_buff *src)
 	nf_conntrack_get(src->nfct);
 	dst->nfctinfo = src->nfctinfo;
 #endif
-#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
-	dst->nfct_reasm = src->nfct_reasm;
-	nf_conntrack_get_reasm(src->nfct_reasm);
-#endif
 #ifdef CONFIG_BRIDGE_NETFILTER
 	dst->nf_bridge  = src->nf_bridge;
 	nf_bridge_get(src->nf_bridge);
@@ -2675,9 +2646,6 @@ static inline void nf_copy(struct sk_buff *dst, const struct sk_buff *src)
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 	nf_conntrack_put(dst->nfct);
 #endif
-#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
-	nf_conntrack_put_reasm(dst->nfct_reasm);
-#endif
 #ifdef CONFIG_BRIDGE_NETFILTER
 	nf_bridge_put(dst->nf_bridge);
 #endif
diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 9c4d37e..772252d 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -109,7 +109,6 @@ extern int ip_vs_conn_tab_size;
 struct ip_vs_iphdr {
 	__u32 len;	/* IPv4 simply where L4 starts
 			   IPv6 where L4 Transport Header starts */
-	__u32 thoff_reasm; /* Transport Header Offset in nfct_reasm skb */
 	__u16 fragoffs; /* IPv6 fragment offset, 0 if first frag (or not frag)*/
 	__s16 protocol;
 	__s32 flags;
@@ -117,34 +116,12 @@ struct ip_vs_iphdr {
 	union nf_inet_addr daddr;
 };
 
-/* Dependency to module: nf_defrag_ipv6 */
-#if defined(CONFIG_NF_DEFRAG_IPV6) || defined(CONFIG_NF_DEFRAG_IPV6_MODULE)
-static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
-{
-	return skb->nfct_reasm;
-}
-static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
-				      int len, void *buffer,
-				      const struct ip_vs_iphdr *ipvsh)
-{
-	if (unlikely(ipvsh->fragoffs && skb_nfct_reasm(skb)))
-		return skb_header_pointer(skb_nfct_reasm(skb),
-					  ipvsh->thoff_reasm, len, buffer);
-
-	return skb_header_pointer(skb, offset, len, buffer);
-}
-#else
-static inline struct sk_buff *skb_nfct_reasm(const struct sk_buff *skb)
-{
-	return NULL;
-}
 static inline void *frag_safe_skb_hp(const struct sk_buff *skb, int offset,
 				      int len, void *buffer,
 				      const struct ip_vs_iphdr *ipvsh)
 {
 	return skb_header_pointer(skb, offset, len, buffer);
 }
-#endif
 
 static inline void
 ip_vs_fill_ip4hdr(const void *nh, struct ip_vs_iphdr *iphdr)
@@ -171,19 +148,12 @@ ip_vs_fill_iph_skb(int af, const struct sk_buff *skb, struct ip_vs_iphdr *iphdr)
 			(struct ipv6hdr *)skb_network_header(skb);
 		iphdr->saddr.in6 = iph->saddr;
 		iphdr->daddr.in6 = iph->daddr;
-		/* ipv6_find_hdr() updates len, flags, thoff_reasm */
-		iphdr->thoff_reasm = 0;
+		/* ipv6_find_hdr() updates len, flags */
 		iphdr->len	 = 0;
 		iphdr->flags	 = 0;
 		iphdr->protocol  = ipv6_find_hdr(skb, &iphdr->len, -1,
 						 &iphdr->fragoffs,
 						 &iphdr->flags);
-		/* get proto from re-assembled packet and it's offset */
-		if (skb_nfct_reasm(skb))
-			iphdr->protocol = ipv6_find_hdr(skb_nfct_reasm(skb),
-							&iphdr->thoff_reasm,
-							-1, NULL, NULL);
-
 	} else
 #endif
 	{
diff --git a/include/net/netfilter/ipv6/nf_defrag_ipv6.h b/include/net/netfilter/ipv6/nf_defrag_ipv6.h
index fd79c9a..80a3f41 100644
--- a/include/net/netfilter/ipv6/nf_defrag_ipv6.h
+++ b/include/net/netfilter/ipv6/nf_defrag_ipv6.h
@@ -6,10 +6,7 @@ extern void nf_defrag_ipv6_enable(void);
 extern int nf_ct_frag6_init(void);
 extern void nf_ct_frag6_cleanup(void);
 extern struct sk_buff *nf_ct_frag6_gather(struct sk_buff *skb, u32 user);
-extern void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
-			       struct net_device *in,
-			       struct net_device *out,
-			       int (*okfn)(struct sk_buff *));
+void nf_ct_frag6_consume_orig(struct sk_buff *skb);
 
 struct inet_frags_ctl;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d81cff1..1371cf8 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -580,9 +580,6 @@ static void skb_release_head_state(struct sk_buff *skb)
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 	nf_conntrack_put(skb->nfct);
 #endif
-#ifdef NET_SKBUFF_NF_DEFRAG_NEEDED
-	nf_conntrack_put_reasm(skb->nfct_reasm);
-#endif
 #ifdef CONFIG_BRIDGE_NETFILTER
 	nf_bridge_put(skb->nf_bridge);
 #endif
diff --git a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
index d6e4dd8..83ab37c 100644
--- a/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_l3proto_ipv6.c
@@ -169,63 +169,13 @@ out:
 	return nf_conntrack_confirm(skb);
 }
 
-static unsigned int __ipv6_conntrack_in(struct net *net,
-					unsigned int hooknum,
-					struct sk_buff *skb,
-					const struct net_device *in,
-					const struct net_device *out,
-					int (*okfn)(struct sk_buff *))
-{
-	struct sk_buff *reasm = skb->nfct_reasm;
-	const struct nf_conn_help *help;
-	struct nf_conn *ct;
-	enum ip_conntrack_info ctinfo;
-
-	/* This packet is fragmented and has reassembled packet. */
-	if (reasm) {
-		/* Reassembled packet isn't parsed yet ? */
-		if (!reasm->nfct) {
-			unsigned int ret;
-
-			ret = nf_conntrack_in(net, PF_INET6, hooknum, reasm);
-			if (ret != NF_ACCEPT)
-				return ret;
-		}
-
-		/* Conntrack helpers need the entire reassembled packet in the
-		 * POST_ROUTING hook. In case of unconfirmed connections NAT
-		 * might reassign a helper, so the entire packet is also
-		 * required.
-		 */
-		ct = nf_ct_get(reasm, &ctinfo);
-		if (ct != NULL && !nf_ct_is_untracked(ct)) {
-			help = nfct_help(ct);
-			if ((help && help->helper) || !nf_ct_is_confirmed(ct)) {
-				nf_conntrack_get_reasm(reasm);
-				NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
-					       (struct net_device *)in,
-					       (struct net_device *)out,
-					       okfn, NF_IP6_PRI_CONNTRACK + 1);
-				return NF_DROP_ERR(-ECANCELED);
-			}
-		}
-
-		nf_conntrack_get(reasm->nfct);
-		skb->nfct = reasm->nfct;
-		skb->nfctinfo = reasm->nfctinfo;
-		return NF_ACCEPT;
-	}
-
-	return nf_conntrack_in(net, PF_INET6, hooknum, skb);
-}
-
 static unsigned int ipv6_conntrack_in(unsigned int hooknum,
 				      struct sk_buff *skb,
 				      const struct net_device *in,
 				      const struct net_device *out,
 				      int (*okfn)(struct sk_buff *))
 {
-	return __ipv6_conntrack_in(dev_net(in), hooknum, skb, in, out, okfn);
+	return nf_conntrack_in(dev_net(in), PF_INET6, hooknum, skb);
 }
 
 static unsigned int ipv6_conntrack_local(unsigned int hooknum,
@@ -239,7 +189,7 @@ static unsigned int ipv6_conntrack_local(unsigned int hooknum,
 		net_notice_ratelimited("ipv6_conntrack_local: packet too short\n");
 		return NF_ACCEPT;
 	}
-	return __ipv6_conntrack_in(dev_net(out), hooknum, skb, in, out, okfn);
+	return nf_conntrack_in(dev_net(out), PF_INET6, hooknum, skb);
 }
 
 static struct nf_hook_ops ipv6_conntrack_ops[] __read_mostly = {
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c
index dffdc1a..253566a 100644
--- a/net/ipv6/netfilter/nf_conntrack_reasm.c
+++ b/net/ipv6/netfilter/nf_conntrack_reasm.c
@@ -621,31 +621,16 @@ ret_orig:
 	return skb;
 }
 
-void nf_ct_frag6_output(unsigned int hooknum, struct sk_buff *skb,
-			struct net_device *in, struct net_device *out,
-			int (*okfn)(struct sk_buff *))
+void nf_ct_frag6_consume_orig(struct sk_buff *skb)
 {
 	struct sk_buff *s, *s2;
-	unsigned int ret = 0;
 
 	for (s = NFCT_FRAG6_CB(skb)->orig; s;) {
-		nf_conntrack_put_reasm(s->nfct_reasm);
-		nf_conntrack_get_reasm(skb);
-		s->nfct_reasm = skb;
-
 		s2 = s->next;
 		s->next = NULL;
-
-		if (ret != -ECANCELED)
-			ret = NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, s,
-					     in, out, okfn,
-					     NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
-		else
-			kfree_skb(s);
-
+		consume_skb(s);
 		s = s2;
 	}
-	nf_conntrack_put_reasm(skb);
 }
 
 static int nf_ct_net_init(struct net *net)
diff --git a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
index aacd121..581dd9e 100644
--- a/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
+++ b/net/ipv6/netfilter/nf_defrag_ipv6_hooks.c
@@ -75,8 +75,11 @@ static unsigned int ipv6_defrag(unsigned int hooknum,
 	if (reasm == skb)
 		return NF_ACCEPT;
 
-	nf_ct_frag6_output(hooknum, reasm, (struct net_device *)in,
-			   (struct net_device *)out, okfn);
+	nf_ct_frag6_consume_orig(reasm);
+
+	NF_HOOK_THRESH(NFPROTO_IPV6, hooknum, reasm,
+		       (struct net_device *) in, (struct net_device *) out,
+		       okfn, NF_IP6_PRI_CONNTRACK_DEFRAG + 1);
 
 	return NF_STOLEN;
 }
diff --git a/net/netfilter/ipvs/ip_vs_core.c b/net/netfilter/ipvs/ip_vs_core.c
index 74fd00c..3581736 100644
--- a/net/netfilter/ipvs/ip_vs_core.c
+++ b/net/netfilter/ipvs/ip_vs_core.c
@@ -1139,12 +1139,6 @@ ip_vs_out(unsigned int hooknum, struct sk_buff *skb, int af)
 	ip_vs_fill_iph_skb(af, skb, &iph);
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6) {
-		if (!iph.fragoffs && skb_nfct_reasm(skb)) {
-			struct sk_buff *reasm = skb_nfct_reasm(skb);
-			/* Save fw mark for coming frags */
-			reasm->ipvs_property = 1;
-			reasm->mark = skb->mark;
-		}
 		if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
 			int related;
 			int verdict = ip_vs_out_icmp_v6(skb, &related,
@@ -1614,12 +1608,6 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 
 #ifdef CONFIG_IP_VS_IPV6
 	if (af == AF_INET6) {
-		if (!iph.fragoffs && skb_nfct_reasm(skb)) {
-			struct sk_buff *reasm = skb_nfct_reasm(skb);
-			/* Save fw mark for coming frags. */
-			reasm->ipvs_property = 1;
-			reasm->mark = skb->mark;
-		}
 		if (unlikely(iph.protocol == IPPROTO_ICMPV6)) {
 			int related;
 			int verdict = ip_vs_in_icmp_v6(skb, &related, hooknum,
@@ -1671,9 +1659,8 @@ ip_vs_in(unsigned int hooknum, struct sk_buff *skb, int af)
 		/* sorry, all this trouble for a no-hit :) */
 		IP_VS_DBG_PKT(12, af, pp, skb, 0,
 			      "ip_vs_in: packet continues traversal as normal");
-		if (iph.fragoffs && !skb_nfct_reasm(skb)) {
+		if (iph.fragoffs) {
 			/* Fragment that couldn't be mapped to a conn entry
-			 * and don't have any pointer to a reasm skb
 			 * is missing module nf_defrag_ipv6
 			 */
 			IP_VS_DBG_RL("Unhandled frag, load nf_defrag_ipv6\n");
@@ -1756,38 +1743,6 @@ ip_vs_local_request4(unsigned int hooknum, struct sk_buff *skb,
 #ifdef CONFIG_IP_VS_IPV6
 
 /*
- * AF_INET6 fragment handling
- * Copy info from first fragment, to the rest of them.
- */
-static unsigned int
-ip_vs_preroute_frag6(unsigned int hooknum, struct sk_buff *skb,
-		     const struct net_device *in,
-		     const struct net_device *out,
-		     int (*okfn)(struct sk_buff *))
-{
-	struct sk_buff *reasm = skb_nfct_reasm(skb);
-	struct net *net;
-
-	/* Skip if not a "replay" from nf_ct_frag6_output or first fragment.
-	 * ipvs_property is set when checking first fragment
-	 * in ip_vs_in() and ip_vs_out().
-	 */
-	if (reasm)
-		IP_VS_DBG(2, "Fragment recv prop:%d\n", reasm->ipvs_property);
-	if (!reasm || !reasm->ipvs_property)
-		return NF_ACCEPT;
-
-	net = skb_net(skb);
-	if (!net_ipvs(net)->enable)
-		return NF_ACCEPT;
-
-	/* Copy stored fw mark, saved in ip_vs_{in,out} */
-	skb->mark = reasm->mark;
-
-	return NF_ACCEPT;
-}
-
-/*
  *	AF_INET6 handler in NF_INET_LOCAL_IN chain
  *	Schedule and forward packets from remote clients
  */
@@ -1924,14 +1879,6 @@ static struct nf_hook_ops ip_vs_ops[] __read_mostly = {
 		.priority	= 100,
 	},
 #ifdef CONFIG_IP_VS_IPV6
-	/* After mangle & nat fetch 2:nd fragment and following */
-	{
-		.hook		= ip_vs_preroute_frag6,
-		.owner		= THIS_MODULE,
-		.pf		= NFPROTO_IPV6,
-		.hooknum	= NF_INET_PRE_ROUTING,
-		.priority	= NF_IP6_PRI_NAT_DST + 1,
-	},
 	/* After packet filtering, change source only for VS/NAT */
 	{
 		.hook		= ip_vs_reply6,
diff --git a/net/netfilter/ipvs/ip_vs_pe_sip.c b/net/netfilter/ipvs/ip_vs_pe_sip.c
index 9ef22bd..bed5f70 100644
--- a/net/netfilter/ipvs/ip_vs_pe_sip.c
+++ b/net/netfilter/ipvs/ip_vs_pe_sip.c
@@ -65,7 +65,6 @@ static int get_callid(const char *dptr, unsigned int dataoff,
 static int
 ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb)
 {
-	struct sk_buff *reasm = skb_nfct_reasm(skb);
 	struct ip_vs_iphdr iph;
 	unsigned int dataoff, datalen, matchoff, matchlen;
 	const char *dptr;
@@ -79,15 +78,10 @@ ip_vs_sip_fill_param(struct ip_vs_conn_param *p, struct sk_buff *skb)
 	/* todo: IPv6 fragments:
 	 *       I think this only should be done for the first fragment. /HS
 	 */
-	if (reasm) {
-		skb = reasm;
-		dataoff = iph.thoff_reasm + sizeof(struct udphdr);
-	} else
-		dataoff = iph.len + sizeof(struct udphdr);
+	dataoff = iph.len + sizeof(struct udphdr);
 
 	if (dataoff >= skb->len)
 		return -EINVAL;
-	/* todo: Check if this will mess-up the reasm skb !!! /HS */
 	retc = skb_linearize(skb);
 	if (retc < 0)
 		return retc;
-- 
1.8.4.3


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: ipv6 fragmentation-related panic in netfilter
  2013-11-19 22:27           ` Wolfgang Walter
@ 2013-11-20 20:43             ` David Miller
  0 siblings, 0 replies; 8+ messages in thread
From: David Miller @ 2013-11-20 20:43 UTC (permalink / raw)
  To: linux; +Cc: hannes, kaber, steffen.klassert, tmshlvck, netdev, netfilter-devel

From: Wolfgang Walter <linux@stwm.de>
Date: Tue, 19 Nov 2013 23:27:40 +0100

> Am Dienstag, 19. November 2013, 13:40:32 schrieb Hannes Frederic Sowa:
>> Yes, see here:
>> 
>> http://patchwork.ozlabs.org/patch/288967/
>> http://patchwork.ozlabs.org/patch/288970/
>> 
> 
> This fixed it. The second patch did not cleanly apply to 3.12 due to
> formatting changes or because some functions in 3.12 get
> 
> 	unsigned int hooknum
> 
> instead of
> 
> 	const struct nf_hook_ops *ops
> 
> I hopefully got it right:at least it works and no crashes any more :-).
> 
> I hope both patches go into stable soon.

They are both queued up.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-11-20 20:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-29 21:07 ipv6 fragmentation-related panic in netfilter Tomas Hlavacek
2013-10-30  0:07 ` Patrick McHardy
2013-11-01  8:45   ` Steffen Klassert
2013-11-01  9:25     ` Patrick McHardy
2013-11-19 11:11       ` Wolfgang Walter
2013-11-19 12:40         ` Hannes Frederic Sowa
2013-11-19 22:27           ` Wolfgang Walter
2013-11-20 20:43             ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.