"Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed

All of lore.kernel.org
 help / color / mirror / Atom feed

* "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-27 21:47 ` Linus Lüssing
  0 siblings, 0 replies; 22+ messages in thread
From: Linus Lüssing @ 2019-01-27 21:47 UTC (permalink / raw)
  To: netfilter-devel-u79uwXL29TY76Z2rM5mHXA
  Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r

Hi,

I was trying to implement a multicast-to-multi-unicast conversion
in batman-adv with the following patch:

https://patchwork.open-mesh.org/patch/17729/

However, on OpenWrt with a 4.9.146 kernel I get a
"Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list".

This only happens upon sending a SIGTERM to the network manager
"netifd" (so upon network shutdown). And only if the node is connected
to mesh of reasonable size, so if there is a certain amount of
multicast traffic for the multicast-to-multi-unicast patch to work on.

Upon normal operation, no such crash seems to occur.

The crash itself is triggered by the:

  BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));

in here:

https://elixir.bootlin.com/linux/v4.9.146/source/net/netfilter/nf_conntrack_core.c#L354

What confuses me a bit is, that the multicast-to-multi-unicast
conversion uses the same/similar, simple skb_copy() approach like the
"classic broadcast flooding" approach in batman-adv so far. The latter too
transmits three redundant frames via skb_copy() to increase
reliability for Wifi broadcast packets.

One difference is that the broadcast flooding adds a bit of
delay between each transmission. Which the multicast-to-multi-unicast
doesn't.

Looking at "git log net/netfilter/nf_conntrack_core.c" I noticed
"netfilter: nfnetlink_queue: resolve clash for unconfirmed
conntracks" (368982cd7). Which says:

"In nfqueue, two consecutive skbuffs may race to create the conntrack
 entry. Hence, the one that loses the race gets dropped due to clash in
 the insertion into the hashes from the nf_conntrack_confirm() path."

This patch is only part of >= 4.18, so not part of the firmware we use
yet. Could this issue somehow be related?

Other than that I was wondering whether we might be missing to
reset something after skb_copy()-ing. We do a "skb->protocol =
htons(ETH_P_BATMAN)" right before the dev_queue_xmit(skb) call in
batman-adv which sends the encapsulated frame into the
mesh. And we do a nf_reset(skb) after decapsulating a frame
received from the mesh. But maybe that is not enough?

Ticket this issue was reported at:

https://github.com/freifunk-gluon/gluon/issues/1468

Regards, Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-27 21:47 ` Linus Lüssing
  0 siblings, 0 replies; 22+ messages in thread
From: Linus Lüssing @ 2019-01-27 21:47 UTC (permalink / raw)
  To: netfilter-devel; +Cc: b.a.t.m.a.n

Hi,

I was trying to implement a multicast-to-multi-unicast conversion
in batman-adv with the following patch:

https://patchwork.open-mesh.org/patch/17729/

However, on OpenWrt with a 4.9.146 kernel I get a
"Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list".

This only happens upon sending a SIGTERM to the network manager
"netifd" (so upon network shutdown). And only if the node is connected
to mesh of reasonable size, so if there is a certain amount of
multicast traffic for the multicast-to-multi-unicast patch to work on.

Upon normal operation, no such crash seems to occur.

The crash itself is triggered by the:

  BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));

in here:

https://elixir.bootlin.com/linux/v4.9.146/source/net/netfilter/nf_conntrack_core.c#L354

What confuses me a bit is, that the multicast-to-multi-unicast
conversion uses the same/similar, simple skb_copy() approach like the
"classic broadcast flooding" approach in batman-adv so far. The latter too
transmits three redundant frames via skb_copy() to increase
reliability for Wifi broadcast packets.

One difference is that the broadcast flooding adds a bit of
delay between each transmission. Which the multicast-to-multi-unicast
doesn't.

Looking at "git log net/netfilter/nf_conntrack_core.c" I noticed
"netfilter: nfnetlink_queue: resolve clash for unconfirmed
conntracks" (368982cd7). Which says:

"In nfqueue, two consecutive skbuffs may race to create the conntrack
 entry. Hence, the one that loses the race gets dropped due to clash in
 the insertion into the hashes from the nf_conntrack_confirm() path."

This patch is only part of >= 4.18, so not part of the firmware we use
yet. Could this issue somehow be related?

Other than that I was wondering whether we might be missing to
reset something after skb_copy()-ing. We do a "skb->protocol =
htons(ETH_P_BATMAN)" right before the dev_queue_xmit(skb) call in
batman-adv which sends the encapsulated frame into the
mesh. And we do a nf_reset(skb) after decapsulating a frame
received from the mesh. But maybe that is not enough?

Ticket this issue was reported at:

https://github.com/freifunk-gluon/gluon/issues/1468

Regards, Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-27 21:47 ` [B.A.T.M.A.N.] " Linus Lüssing
@ 2019-01-27 22:48   ` Florian Westphal
  -1 siblings, 0 replies; 22+ messages in thread
From: Florian Westphal @ 2019-01-27 22:48 UTC (permalink / raw)
  To: Linus Lüssing
  Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA

Linus Lüssing <linus.luessing-djzkFPsfvsizQB+pC5nmwQ@public.gmane.org> wrote:
> This only happens upon sending a SIGTERM to the network manager
> "netifd" (so upon network shutdown). And only if the node is connected
> to mesh of reasonable size, so if there is a certain amount of
> multicast traffic for the multicast-to-multi-unicast patch to work on.

Does this still trigger when you do

nf_reset(newskb);

after skb_copy()?

> One difference is that the broadcast flooding adds a bit of
> delay between each transmission. Which the multicast-to-multi-unicast
> doesn't.

Are those transmits done asynchronously?

conntrack assumes exclusive access to skb->nfct if the conntrack
entry isn't in main hash table.

(i.e, when nf_ct_is_confirmed returns false).

> "In nfqueue, two consecutive skbuffs may race to create the conntrack
>  entry. Hence, the one that loses the race gets dropped due to clash in
>  the insertion into the hashes from the nf_conntrack_confirm() path."
> 
> This patch is only part of >= 4.18, so not part of the firmware we use
> yet. Could this issue somehow be related?

Possible, but I don't think its likely.
In the nfquee case there is asynchronous processing, but
no skb can share the same conntrack entry unless the entry is already
in the conntrack hash table.

> Other than that I was wondering whether we might be missing to
> reset something after skb_copy()-ing. We do a "skb->protocol =
> htons(ETH_P_BATMAN)" right before the dev_queue_xmit(skb) call in
> batman-adv which sends the encapsulated frame into the
> mesh. And we do a nf_reset(skb) after decapsulating a frame
> received from the mesh. But maybe that is not enough?

I suggest nf_reset() on xmit, if you can be sure that the xmit
won't occur back-to-self (netns case is fine, as skb scrubbing
resets skb nfct anyway) and the skb isn't on a rexmit list somewhere.
(clone is fine, only shared skb would break).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-27 22:48   ` Florian Westphal
  0 siblings, 0 replies; 22+ messages in thread
From: Florian Westphal @ 2019-01-27 22:48 UTC (permalink / raw)
  To: Linus Lüssing; +Cc: netfilter-devel, b.a.t.m.a.n

Linus Lüssing <linus.luessing@c0d3.blue> wrote:
> This only happens upon sending a SIGTERM to the network manager
> "netifd" (so upon network shutdown). And only if the node is connected
> to mesh of reasonable size, so if there is a certain amount of
> multicast traffic for the multicast-to-multi-unicast patch to work on.

Does this still trigger when you do

nf_reset(newskb);

after skb_copy()?

> One difference is that the broadcast flooding adds a bit of
> delay between each transmission. Which the multicast-to-multi-unicast
> doesn't.

Are those transmits done asynchronously?

conntrack assumes exclusive access to skb->nfct if the conntrack
entry isn't in main hash table.

(i.e, when nf_ct_is_confirmed returns false).

> "In nfqueue, two consecutive skbuffs may race to create the conntrack
>  entry. Hence, the one that loses the race gets dropped due to clash in
>  the insertion into the hashes from the nf_conntrack_confirm() path."
> 
> This patch is only part of >= 4.18, so not part of the firmware we use
> yet. Could this issue somehow be related?

Possible, but I don't think its likely.
In the nfquee case there is asynchronous processing, but
no skb can share the same conntrack entry unless the entry is already
in the conntrack hash table.

> Other than that I was wondering whether we might be missing to
> reset something after skb_copy()-ing. We do a "skb->protocol =
> htons(ETH_P_BATMAN)" right before the dev_queue_xmit(skb) call in
> batman-adv which sends the encapsulated frame into the
> mesh. And we do a nf_reset(skb) after decapsulating a frame
> received from the mesh. But maybe that is not enough?

I suggest nf_reset() on xmit, if you can be sure that the xmit
won't occur back-to-self (netns case is fine, as skb scrubbing
resets skb nfct anyway) and the skb isn't on a rexmit list somewhere.
(clone is fine, only shared skb would break).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-27 22:48   ` [B.A.T.M.A.N.] " Florian Westphal
@ 2019-01-28 13:35       ` Chieh-Min Wang
  -1 siblings, 0 replies; 22+ messages in thread
From: Chieh-Min Wang @ 2019-01-28 13:35 UTC (permalink / raw)
  To: Florian Westphal
  Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA

I think this is the same issue as this one.

http://patchwork.ozlabs.org/patch/995825/

Florian Westphal <fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org> 於 2019年1月28日 週一 上午6:51寫道：
>
> Linus Lüssing <linus.luessing-djzkFPsfvsizQB+pC5nmwQ@public.gmane.org> wrote:
> > This only happens upon sending a SIGTERM to the network manager
> > "netifd" (so upon network shutdown). And only if the node is connected
> > to mesh of reasonable size, so if there is a certain amount of
> > multicast traffic for the multicast-to-multi-unicast patch to work on.
>
> Does this still trigger when you do
>
> nf_reset(newskb);
>
> after skb_copy()?
>
> > One difference is that the broadcast flooding adds a bit of
> > delay between each transmission. Which the multicast-to-multi-unicast
> > doesn't.
>
> Are those transmits done asynchronously?
>
> conntrack assumes exclusive access to skb->nfct if the conntrack
> entry isn't in main hash table.
>
> (i.e, when nf_ct_is_confirmed returns false).
>
> > "In nfqueue, two consecutive skbuffs may race to create the conntrack
> >  entry. Hence, the one that loses the race gets dropped due to clash in
> >  the insertion into the hashes from the nf_conntrack_confirm() path."
> >
> > This patch is only part of >= 4.18, so not part of the firmware we use
> > yet. Could this issue somehow be related?
>
> Possible, but I don't think its likely.
> In the nfquee case there is asynchronous processing, but
> no skb can share the same conntrack entry unless the entry is already
> in the conntrack hash table.
>
> > Other than that I was wondering whether we might be missing to
> > reset something after skb_copy()-ing. We do a "skb->protocol =
> > htons(ETH_P_BATMAN)" right before the dev_queue_xmit(skb) call in
> > batman-adv which sends the encapsulated frame into the
> > mesh. And we do a nf_reset(skb) after decapsulating a frame
> > received from the mesh. But maybe that is not enough?
>
> I suggest nf_reset() on xmit, if you can be sure that the xmit
> won't occur back-to-self (netns case is fine, as skb scrubbing
> resets skb nfct anyway) and the skb isn't on a rexmit list somewhere.
> (clone is fine, only shared skb would break).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-28 13:35       ` Chieh-Min Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Chieh-Min Wang @ 2019-01-28 13:35 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Linus Lüssing, netfilter-devel, b.a.t.m.a.n

I think this is the same issue as this one.

http://patchwork.ozlabs.org/patch/995825/

Florian Westphal <fw@strlen.de> 於 2019年1月28日 週一 上午6:51寫道：
>
> Linus Lüssing <linus.luessing@c0d3.blue> wrote:
> > This only happens upon sending a SIGTERM to the network manager
> > "netifd" (so upon network shutdown). And only if the node is connected
> > to mesh of reasonable size, so if there is a certain amount of
> > multicast traffic for the multicast-to-multi-unicast patch to work on.
>
> Does this still trigger when you do
>
> nf_reset(newskb);
>
> after skb_copy()?
>
> > One difference is that the broadcast flooding adds a bit of
> > delay between each transmission. Which the multicast-to-multi-unicast
> > doesn't.
>
> Are those transmits done asynchronously?
>
> conntrack assumes exclusive access to skb->nfct if the conntrack
> entry isn't in main hash table.
>
> (i.e, when nf_ct_is_confirmed returns false).
>
> > "In nfqueue, two consecutive skbuffs may race to create the conntrack
> >  entry. Hence, the one that loses the race gets dropped due to clash in
> >  the insertion into the hashes from the nf_conntrack_confirm() path."
> >
> > This patch is only part of >= 4.18, so not part of the firmware we use
> > yet. Could this issue somehow be related?
>
> Possible, but I don't think its likely.
> In the nfquee case there is asynchronous processing, but
> no skb can share the same conntrack entry unless the entry is already
> in the conntrack hash table.
>
> > Other than that I was wondering whether we might be missing to
> > reset something after skb_copy()-ing. We do a "skb->protocol =
> > htons(ETH_P_BATMAN)" right before the dev_queue_xmit(skb) call in
> > batman-adv which sends the encapsulated frame into the
> > mesh. And we do a nf_reset(skb) after decapsulating a frame
> > received from the mesh. But maybe that is not enough?
>
> I suggest nf_reset() on xmit, if you can be sure that the xmit
> won't occur back-to-self (netns case is fine, as skb scrubbing
> resets skb nfct anyway) and the skb isn't on a rexmit list somewhere.
> (clone is fine, only shared skb would break).

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-28 13:35       ` [B.A.T.M.A.N.] " Chieh-Min Wang
@ 2019-01-28 13:39           ` Florian Westphal
  -1 siblings, 0 replies; 22+ messages in thread
From: Florian Westphal @ 2019-01-28 13:39 UTC (permalink / raw)
  To: Chieh-Min Wang
  Cc: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, Florian Westphal

Chieh-Min Wang <chiehmin18-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I think this is the same issue as this one.
> 
> http://patchwork.ozlabs.org/patch/995825/

Yes, likely.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-28 13:39           ` Florian Westphal
  0 siblings, 0 replies; 22+ messages in thread
From: Florian Westphal @ 2019-01-28 13:39 UTC (permalink / raw)
  To: Chieh-Min Wang
  Cc: Florian Westphal, Linus Lüssing, netfilter-devel, b.a.t.m.a.n

Chieh-Min Wang <chiehmin18@gmail.com> wrote:
> I think this is the same issue as this one.
> 
> http://patchwork.ozlabs.org/patch/995825/

Yes, likely.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-28 13:39           ` [B.A.T.M.A.N.] " Florian Westphal
@ 2019-01-28 13:50               ` Pablo Neira Ayuso
  -1 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2019-01-28 13:50 UTC (permalink / raw)
  To: Florian Westphal
  Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA, Chieh-Min Wang

On Mon, Jan 28, 2019 at 02:39:40PM +0100, Florian Westphal wrote:
> Chieh-Min Wang <chiehmin18-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > I think this is the same issue as this one.
> > 
> > http://patchwork.ozlabs.org/patch/995825/
> 
> Yes, likely.

I see.

I don't think letting the packet go through is a good idea. Not sure
NAT will work fine, packets would go through being unmangled? I think
we should still drop the packet until we fix this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-28 13:50               ` Pablo Neira Ayuso
  0 siblings, 0 replies; 22+ messages in thread
From: Pablo Neira Ayuso @ 2019-01-28 13:50 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Chieh-Min Wang, Linus Lüssing, netfilter-devel, b.a.t.m.a.n

On Mon, Jan 28, 2019 at 02:39:40PM +0100, Florian Westphal wrote:
> Chieh-Min Wang <chiehmin18@gmail.com> wrote:
> > I think this is the same issue as this one.
> > 
> > http://patchwork.ozlabs.org/patch/995825/
> 
> Yes, likely.

I see.

I don't think letting the packet go through is a good idea. Not sure
NAT will work fine, packets would go through being unmangled? I think
we should still drop the packet until we fix this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-28 13:50               ` [B.A.T.M.A.N.] " Pablo Neira Ayuso
@ 2019-01-28 14:01                 ` Florian Westphal
  -1 siblings, 0 replies; 22+ messages in thread
From: Florian Westphal @ 2019-01-28 14:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, Florian Westphal,
	Chieh-Min Wang

Pablo Neira Ayuso <pablo-Cap9r6Oaw4JrovVCs/uTlw@public.gmane.org> wrote:
> I don't think letting the packet go through is a good idea. Not sure
> NAT will work fine, packets would go through being unmangled? I think
> we should still drop the packet until we fix this.

Unfortuntely this is still a band-aid solution, nfqueue + bridge doesn't
work when mcast/flood is involved.

Problematic cases are NAT (several bindings on same conntrack
simultaneously) and extension realloction.
They are not a problem in most cases due to prealloced space and
because extensions are commonly added before bridge starts to clone
for flooding.

For NAT, the race window is small and iirc we changed nat core to
just warn in case the nat bit is already set.

I think it will work fine in most cases with this patch (i.e.,
witch accept verdict) though; it is better than what we do now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-28 14:01                 ` Florian Westphal
  0 siblings, 0 replies; 22+ messages in thread
From: Florian Westphal @ 2019-01-28 14:01 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Chieh-Min Wang, Linus Lüssing,
	netfilter-devel, b.a.t.m.a.n

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> I don't think letting the packet go through is a good idea. Not sure
> NAT will work fine, packets would go through being unmangled? I think
> we should still drop the packet until we fix this.

Unfortuntely this is still a band-aid solution, nfqueue + bridge doesn't
work when mcast/flood is involved.

Problematic cases are NAT (several bindings on same conntrack
simultaneously) and extension realloction.
They are not a problem in most cases due to prealloced space and
because extensions are commonly added before bridge starts to clone
for flooding.

For NAT, the race window is small and iirc we changed nat core to
just warn in case the nat bit is already set.

I think it will work fine in most cases with this patch (i.e.,
witch accept verdict) though; it is better than what we do now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-28 13:50               ` [B.A.T.M.A.N.] " Pablo Neira Ayuso
@ 2019-01-28 14:03                 ` Chieh-Min Wang
  -1 siblings, 0 replies; 22+ messages in thread
From: Chieh-Min Wang @ 2019-01-28 14:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, Florian Westphal

I think 71d8c47fc653711c4(netfilter: conntrack: introduce clash
resolution on insertion race) is doing the same logic for resolving
conntrack clashing.

The first packet who win the race should handle the NAT stuff on the
conntrack right?

Pablo Neira Ayuso <pablo-Cap9r6Oaw4JrovVCs/uTlw@public.gmane.org> 於 2019年1月28日 週一 下午9:50寫道：
>
> On Mon, Jan 28, 2019 at 02:39:40PM +0100, Florian Westphal wrote:
> > Chieh-Min Wang <chiehmin18-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > I think this is the same issue as this one.
> > >
> > > http://patchwork.ozlabs.org/patch/995825/
> >
> > Yes, likely.
>
> I see.
>
> I don't think letting the packet go through is a good idea. Not sure
> NAT will work fine, packets would go through being unmangled? I think
> we should still drop the packet until we fix this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-28 14:03                 ` Chieh-Min Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Chieh-Min Wang @ 2019-01-28 14:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Florian Westphal, Linus Lüssing, netfilter-devel, b.a.t.m.a.n

I think 71d8c47fc653711c4(netfilter: conntrack: introduce clash
resolution on insertion race) is doing the same logic for resolving
conntrack clashing.

The first packet who win the race should handle the NAT stuff on the
conntrack right?

Pablo Neira Ayuso <pablo@netfilter.org> 於 2019年1月28日 週一 下午9:50寫道：
>
> On Mon, Jan 28, 2019 at 02:39:40PM +0100, Florian Westphal wrote:
> > Chieh-Min Wang <chiehmin18@gmail.com> wrote:
> > > I think this is the same issue as this one.
> > >
> > > http://patchwork.ozlabs.org/patch/995825/
> >
> > Yes, likely.
>
> I see.
>
> I don't think letting the packet go through is a good idea. Not sure
> NAT will work fine, packets would go through being unmangled? I think
> we should still drop the packet until we fix this.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-28 14:03                 ` [B.A.T.M.A.N.] " Chieh-Min Wang
@ 2019-01-28 14:13                     ` Florian Westphal
  -1 siblings, 0 replies; 22+ messages in thread
From: Florian Westphal @ 2019-01-28 14:13 UTC (permalink / raw)
  To: Chieh-Min Wang
  Cc: netfilter-devel-u79uwXL29TY76Z2rM5mHXA,
	b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r, Florian Westphal,
	Pablo Neira Ayuso

Chieh-Min Wang <chiehmin18-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> I think 71d8c47fc653711c4(netfilter: conntrack: introduce clash
> resolution on insertion race) is doing the same logic for resolving
> conntrack clashing.

No, that commit dealsl with the case where two skbs have different
conntrack objects but where tuples are the same.

In nfqueue+bridge flood case the skbs point to the same conntrack
object.

Maybe one way to fix this would be to let nfqueue perform a deep
copy of skb->_nfct in case conntrack is unconfirmed and skb_shared()
is true.

But that would of course cause drop for l4 protocols that do not support
clash resolution.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-28 14:13                     ` Florian Westphal
  0 siblings, 0 replies; 22+ messages in thread
From: Florian Westphal @ 2019-01-28 14:13 UTC (permalink / raw)
  To: Chieh-Min Wang
  Cc: Pablo Neira Ayuso, Florian Westphal, Linus Lüssing,
	netfilter-devel, b.a.t.m.a.n

Chieh-Min Wang <chiehmin18@gmail.com> wrote:
> I think 71d8c47fc653711c4(netfilter: conntrack: introduce clash
> resolution on insertion race) is doing the same logic for resolving
> conntrack clashing.

No, that commit dealsl with the case where two skbs have different
conntrack objects but where tuples are the same.

In nfqueue+bridge flood case the skbs point to the same conntrack
object.

Maybe one way to fix this would be to let nfqueue perform a deep
copy of skb->_nfct in case conntrack is unconfirmed and skb_shared()
is true.

But that would of course cause drop for l4 protocols that do not support
clash resolution.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-28 14:13                     ` [B.A.T.M.A.N.] " Florian Westphal
@ 2019-01-28 14:16                         ` Chieh-Min Wang
  -1 siblings, 0 replies; 22+ messages in thread
From: Chieh-Min Wang @ 2019-01-28 14:16 UTC (permalink / raw)
  To: Florian Westphal
  Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA, Pablo Neira Ayuso

Yes, I understand the cases are different.

However, the result after resolving clashing will cause two skb point
to the same conntrack.
The first skb with confirmed conntrack may not pass through the NAT
when the second skb resolve clashing?

Florian Westphal <fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org> 於 2019年1月28日 週一 下午10:13寫道：
>
> Chieh-Min Wang <chiehmin18-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > I think 71d8c47fc653711c4(netfilter: conntrack: introduce clash
> > resolution on insertion race) is doing the same logic for resolving
> > conntrack clashing.
>
> No, that commit dealsl with the case where two skbs have different
> conntrack objects but where tuples are the same.
>
> In nfqueue+bridge flood case the skbs point to the same conntrack
> object.
>
> Maybe one way to fix this would be to let nfqueue perform a deep
> copy of skb->_nfct in case conntrack is unconfirmed and skb_shared()
> is true.
>
> But that would of course cause drop for l4 protocols that do not support
> clash resolution.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-28 14:16                         ` Chieh-Min Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Chieh-Min Wang @ 2019-01-28 14:16 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Linus Lüssing, netfilter-devel, b.a.t.m.a.n

Yes, I understand the cases are different.

However, the result after resolving clashing will cause two skb point
to the same conntrack.
The first skb with confirmed conntrack may not pass through the NAT
when the second skb resolve clashing?

Florian Westphal <fw@strlen.de> 於 2019年1月28日 週一 下午10:13寫道：
>
> Chieh-Min Wang <chiehmin18@gmail.com> wrote:
> > I think 71d8c47fc653711c4(netfilter: conntrack: introduce clash
> > resolution on insertion race) is doing the same logic for resolving
> > conntrack clashing.
>
> No, that commit dealsl with the case where two skbs have different
> conntrack objects but where tuples are the same.
>
> In nfqueue+bridge flood case the skbs point to the same conntrack
> object.
>
> Maybe one way to fix this would be to let nfqueue perform a deep
> copy of skb->_nfct in case conntrack is unconfirmed and skb_shared()
> is true.
>
> But that would of course cause drop for l4 protocols that do not support
> clash resolution.
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-28 14:16                         ` [B.A.T.M.A.N.] " Chieh-Min Wang
@ 2019-01-28 14:25                             ` Chieh-Min Wang
  -1 siblings, 0 replies; 22+ messages in thread
From: Chieh-Min Wang @ 2019-01-28 14:25 UTC (permalink / raw)
  To: Florian Westphal
  Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r,
	netfilter-devel-u79uwXL29TY76Z2rM5mHXA, Pablo Neira Ayuso

The point I am clarifying is that with NFQUEUE working on the system.

After resolving clashing using 71d8c47fc6537, it may result to two skb
holding the same conntrack on different core traveling in the
netfilter just like the nfqueue + multicast case.

Chieh-Min Wang <chiehmin18-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 於 2019年1月28日 週一 下午10:16寫道：
>
> Yes, I understand the cases are different.
>
> However, the result after resolving clashing will cause two skb point
> to the same conntrack.
> The first skb with confirmed conntrack may not pass through the NAT
> when the second skb resolve clashing?
>
> Florian Westphal <fw-HFFVJYpyMKqzQB+pC5nmwQ@public.gmane.org> 於 2019年1月28日 週一 下午10:13寫道：
> >
> > Chieh-Min Wang <chiehmin18-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > I think 71d8c47fc653711c4(netfilter: conntrack: introduce clash
> > > resolution on insertion race) is doing the same logic for resolving
> > > conntrack clashing.
> >
> > No, that commit dealsl with the case where two skbs have different
> > conntrack objects but where tuples are the same.
> >
> > In nfqueue+bridge flood case the skbs point to the same conntrack
> > object.
> >
> > Maybe one way to fix this would be to let nfqueue perform a deep
> > copy of skb->_nfct in case conntrack is unconfirmed and skb_shared()
> > is true.
> >
> > But that would of course cause drop for l4 protocols that do not support
> > clash resolution.
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-28 14:25                             ` Chieh-Min Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Chieh-Min Wang @ 2019-01-28 14:25 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Pablo Neira Ayuso, Linus Lüssing, netfilter-devel, b.a.t.m.a.n

The point I am clarifying is that with NFQUEUE working on the system.

After resolving clashing using 71d8c47fc6537, it may result to two skb
holding the same conntrack on different core traveling in the
netfilter just like the nfqueue + multicast case.

Chieh-Min Wang <chiehmin18@gmail.com> 於 2019年1月28日 週一 下午10:16寫道：
>
> Yes, I understand the cases are different.
>
> However, the result after resolving clashing will cause two skb point
> to the same conntrack.
> The first skb with confirmed conntrack may not pass through the NAT
> when the second skb resolve clashing?
>
> Florian Westphal <fw@strlen.de> 於 2019年1月28日 週一 下午10:13寫道：
> >
> > Chieh-Min Wang <chiehmin18@gmail.com> wrote:
> > > I think 71d8c47fc653711c4(netfilter: conntrack: introduce clash
> > > resolution on insertion race) is doing the same logic for resolving
> > > conntrack clashing.
> >
> > No, that commit dealsl with the case where two skbs have different
> > conntrack objects but where tuples are the same.
> >
> > In nfqueue+bridge flood case the skbs point to the same conntrack
> > object.
> >
> > Maybe one way to fix this would be to let nfqueue perform a deep
> > copy of skb->_nfct in case conntrack is unconfirmed and skb_shared()
> > is true.
> >
> > But that would of course cause drop for l4 protocols that do not support
> > clash resolution.
> >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
  2019-01-27 21:47 ` [B.A.T.M.A.N.] " Linus Lüssing
@ 2019-01-29  9:07   ` Linus Lüssing
  -1 siblings, 0 replies; 22+ messages in thread
From: Linus Lüssing @ 2019-01-29  9:07 UTC (permalink / raw)
  To: netfilter-devel-u79uwXL29TY76Z2rM5mHXA
  Cc: b.a.t.m.a.n-ZwoEplunGu2X36UT3dwllkB+6BGkLq7r

On Sun, Jan 27, 2019 at 10:47:08PM +0100, Linus Lüssing wrote:
[...]
> The crash itself is triggered by the:
> 
>   BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
> 
> in here:
> 
> https://elixir.bootlin.com/linux/v4.9.146/source/net/netfilter/nf_conntrack_core.c#L354

I had tried the nf_reset()s and Wang's patch but with no success.

Skimming through the code I noticed that there aren't that many
opportunities for the hnnode to become zero. There are several
hlist_nulls_del_rcu(), but no hlist_nulls_del_init_rcu()s for
instance.

That started to make me wonder whether something from "outside"
might be setting the hnnode to zero - and yeah...

I missed that batadv_send_skb_unicast() always frees/consumes the
skb... and I was freeing the skb myself if that call returned
!NET_XMIT_SUCCESS. So a double kfree_skb()... I'm a bit surprised
that things did not crash more often...

Sorry for the noise :-(. But thanks for all the help and quick
responses!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [B.A.T.M.A.N.] "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list"
@ 2019-01-29  9:07   ` Linus Lüssing
  0 siblings, 0 replies; 22+ messages in thread
From: Linus Lüssing @ 2019-01-29  9:07 UTC (permalink / raw)
  To: netfilter-devel; +Cc: b.a.t.m.a.n

On Sun, Jan 27, 2019 at 10:47:08PM +0100, Linus Lüssing wrote:
[...]
> The crash itself is triggered by the:
> 
>   BUG_ON(hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode));
> 
> in here:
> 
> https://elixir.bootlin.com/linux/v4.9.146/source/net/netfilter/nf_conntrack_core.c#L354

I had tried the nf_reset()s and Wang's patch but with no success.

Skimming through the code I noticed that there aren't that many
opportunities for the hnnode to become zero. There are several
hlist_nulls_del_rcu(), but no hlist_nulls_del_init_rcu()s for
instance.

That started to make me wonder whether something from "outside"
might be setting the hnnode to zero - and yeah...

I missed that batadv_send_skb_unicast() always frees/consumes the
skb... and I was freeing the skb myself if that call returned
!NET_XMIT_SUCCESS. So a double kfree_skb()... I'm a bit surprised
that things did not crash more often...

Sorry for the noise :-(. But thanks for all the help and quick
responses!

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-01-29  9:07 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-27 21:47 "Kernel bug detected [...] nf_ct_del_from_dying_or_unconfirmed_list" Linus Lüssing
2019-01-27 21:47 ` [B.A.T.M.A.N.] " Linus Lüssing
2019-01-27 22:48 ` Florian Westphal
2019-01-27 22:48   ` [B.A.T.M.A.N.] " Florian Westphal
     [not found]   ` <20190127224822.lsagihtfiuvxyool-E0PNVn5OA6ohrxcnuTQ+TQ@public.gmane.org>
2019-01-28 13:35     ` Chieh-Min Wang
2019-01-28 13:35       ` [B.A.T.M.A.N.] " Chieh-Min Wang
     [not found]       ` <CALJUYjOq-xpjorsfnMRthzmC+iuDTVOPHXRb2p3ahU248Jrw4w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-28 13:39         ` Florian Westphal
2019-01-28 13:39           ` [B.A.T.M.A.N.] " Florian Westphal
     [not found]           ` <20190128133940.jxwuscyi2wvbfb52-E0PNVn5OA6ohrxcnuTQ+TQ@public.gmane.org>
2019-01-28 13:50             ` Pablo Neira Ayuso
2019-01-28 13:50               ` [B.A.T.M.A.N.] " Pablo Neira Ayuso
2019-01-28 14:01               ` Florian Westphal
2019-01-28 14:01                 ` [B.A.T.M.A.N.] " Florian Westphal
2019-01-28 14:03               ` Chieh-Min Wang
2019-01-28 14:03                 ` [B.A.T.M.A.N.] " Chieh-Min Wang
     [not found]                 ` <CALJUYjO4=pDT0COGJRx2YWDMiEJTpa2CdqQqxndd93khVDZHjA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-28 14:13                   ` Florian Westphal
2019-01-28 14:13                     ` [B.A.T.M.A.N.] " Florian Westphal
     [not found]                     ` <20190128141317.pxq7vklx346bv2bu-E0PNVn5OA6ohrxcnuTQ+TQ@public.gmane.org>
2019-01-28 14:16                       ` Chieh-Min Wang
2019-01-28 14:16                         ` [B.A.T.M.A.N.] " Chieh-Min Wang
     [not found]                         ` <CALJUYjMM7EJVxxhh_q=607yn7OXhfnhrnk+m=tQ7C8GJjOCDcA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-28 14:25                           ` Chieh-Min Wang
2019-01-28 14:25                             ` [B.A.T.M.A.N.] " Chieh-Min Wang
2019-01-29  9:07 ` Linus Lüssing
2019-01-29  9:07   ` [B.A.T.M.A.N.] " Linus Lüssing

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.