All of lore.kernel.org
 help / color / mirror / Atom feed
* nfqueue & bridge netfilter considered broken
@ 2016-09-02  9:08 Florian Westphal
  2016-09-02  9:58 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2016-09-02  9:08 UTC (permalink / raw)
  To: netfilter-devel

Hi.

This is a note to summarize state of bridge + br_netfilter + nfqueue.

TL;DR: I am giving up.  I see no way to fix this in a sane fashion.

What I tried:
I - discard extra nfct entry when cloning.  Works, but obviously not
 compatible in any way (the clones are INVALID).

II - add locking where needed.  It gets out of hand rather quickly,
 it is also pushed on everyone else rather than just those places
 that cause clones with unconfirmed conntrack entries (only places
 that i know of is mcast routing and the bridge).

III - serialize on reinject.
 Idea was to change nfqnl_recv_verdict() to (instead of nf_reinject() )
 call a wrapper, do_nf_reinject, which essentially does:

static void do_nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
{
#if IS_ENABLED(NF_CONNTRACK)
   struct nf_conn *ct = entry->skb->nfct;

  if (!ct || nf_ct_is_confirmed(ct)) {
       nf_reinject(entry, verdict);
       return 0;
  }

  if (skb_cloned(entry->skb)) {
      module_get(THIS_MODULE);
      push_to_work_queue_for_reinject(entry, verdict);
      return;
  }
#endif
  nf_reinject(entry, verdict);
}

The work queue then makes sure that we feed such skb one-by-one.

Aside from this being ugly as f there are two problems:
 1. skb_cloned() being true does NOT mean that we have to serialize, skb->nfct might
    be unique after all, i.e. we again end up pushing work on users/cases
    where this might not be needed at all.

 2. We can't know if all clones get queued via nfqueue, so we could have something like:
    a. clone is created
    b. this clone is sent to userspace for queueing
    c. another clone is created
    d. this clone is NOT queued to userspace and about to continue traversal
    e. at same time, userspace reinjects the older clone on other cpu
    -> we race again despite such 'serializer'.

This leaves option number IV:
 - when cloning in bridge, also make a full copy of ->nfct.
 This has 2 issues:
   1. dependency from bridge to conntrack (yuck)
   2. confirmation will find clash and toss the clones ...
    ... so we will need more clash resolution hacks (double-yuck).

At this point I am considering submitting a patch to
remove nfqueue support for bridge netfilter altogether.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfqueue & bridge netfilter considered broken
  2016-09-02  9:08 nfqueue & bridge netfilter considered broken Florian Westphal
@ 2016-09-02  9:58 ` Pablo Neira Ayuso
  2016-09-02 10:00   ` Pablo Neira Ayuso
  0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-02  9:58 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Sep 02, 2016 at 11:08:48AM +0200, Florian Westphal wrote:
> I - discard extra nfct entry when cloning.  Works, but obviously not
>  compatible in any way (the clones are INVALID).

This approach is simple and it would only break when packets are
flooded to all ports, actually this is not working anyway because of
clashes at confirm, right?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfqueue & bridge netfilter considered broken
  2016-09-02  9:58 ` Pablo Neira Ayuso
@ 2016-09-02 10:00   ` Pablo Neira Ayuso
  2016-09-02 10:22     ` Florian Westphal
  0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-02 10:00 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Sep 02, 2016 at 11:58:53AM +0200, Pablo Neira Ayuso wrote:
> On Fri, Sep 02, 2016 at 11:08:48AM +0200, Florian Westphal wrote:
> > I - discard extra nfct entry when cloning.  Works, but obviously not
> >  compatible in any way (the clones are INVALID).
> 
> This approach is simple and it would only break when packets are
> flooded to all ports, actually this is not working anyway because of
> clashes at confirm, right?

Hm, what about attaching the notrack conntrack for this case?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfqueue & bridge netfilter considered broken
  2016-09-02 10:00   ` Pablo Neira Ayuso
@ 2016-09-02 10:22     ` Florian Westphal
  2016-09-06 10:10       ` Pablo Neira Ayuso
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2016-09-02 10:22 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Fri, Sep 02, 2016 at 11:58:53AM +0200, Pablo Neira Ayuso wrote:
> > On Fri, Sep 02, 2016 at 11:08:48AM +0200, Florian Westphal wrote:
> > > I - discard extra nfct entry when cloning.  Works, but obviously not
> > >  compatible in any way (the clones are INVALID).
> > 
> > This approach is simple and it would only break when packets are
> > flooded to all ports, actually this is not working anyway because of
> > clashes at confirm, right?
> 
> Hm, what about attaching the notrack conntrack for this case?

This is what Patrick said last time this came up (source:
http://marc.info/?l=netfilter-devel&m=131471329004889&w=2 ):

"I don't think the clones should have invalid state, even untracked is
very questionable since all packets should have NAT applied to them in
the same way, connmarks might be used etc.

We probably need to restore the above mentioned assumption somehow. One
way would be to serialize reinjection of packets belonging to
unconfirmed conntracks in nf_reinject or the queueing modules. Conntrack
related stuff doesn't really belong there, but it seems like the easiest
and safest fix to me."

As for bridge conntrack, this is indeed a good question.

Seems we will need to register a dedicated conntrack bridge hook that
takes care of uncloning in FORWARD hook, i.e. add a hook in FORWARD
that makes a deep copy of all unconfirmed conntracks if skb is cloned,
and (once skb reaches nf_confirm) do a non-destructive clash resolution
(accept instead of drop of the clashing entries should be enough).

We have to sacrifice another status bit for this, or perhaps add a
bridge conntrack extension to store such a clash hint though.

Any other idea?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfqueue & bridge netfilter considered broken
  2016-09-02 10:22     ` Florian Westphal
@ 2016-09-06 10:10       ` Pablo Neira Ayuso
  2016-09-06 11:24         ` Florian Westphal
  0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2016-09-06 10:10 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel

On Fri, Sep 02, 2016 at 12:22:44PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Fri, Sep 02, 2016 at 11:58:53AM +0200, Pablo Neira Ayuso wrote:
> > > On Fri, Sep 02, 2016 at 11:08:48AM +0200, Florian Westphal wrote:
> > > > I - discard extra nfct entry when cloning.  Works, but obviously not
> > > >  compatible in any way (the clones are INVALID).
> > > 
> > > This approach is simple and it would only break when packets are
> > > flooded to all ports, actually this is not working anyway because of
> > > clashes at confirm, right?
> > 
> > Hm, what about attaching the notrack conntrack for this case?
> 
> This is what Patrick said last time this came up (source:
> http://marc.info/?l=netfilter-devel&m=131471329004889&w=2 ):
> 
> "I don't think the clones should have invalid state, even untracked is
> very questionable since all packets should have NAT applied to them in
> the same way, connmarks might be used etc.
> 
> We probably need to restore the above mentioned assumption somehow. One
> way would be to serialize reinjection of packets belonging to
> unconfirmed conntracks in nf_reinject or the queueing modules. Conntrack
> related stuff doesn't really belong there, but it seems like the easiest
> and safest fix to me."
> 
> As for bridge conntrack, this is indeed a good question.
> 
> Seems we will need to register a dedicated conntrack bridge hook that
> takes care of uncloning in FORWARD hook, i.e. add a hook in FORWARD
> that makes a deep copy of all unconfirmed conntracks if skb is cloned,
> and (once skb reaches nf_confirm) do a non-destructive clash resolution
> (accept instead of drop of the clashing entries should be enough).
>
> We have to sacrifice another status bit for this, or perhaps add a
> bridge conntrack extension to store such a clash hint though.

Assuming nf_nat_setup_info() was not yet called, ie. NAT from
postrouting case, then these packets with a deep copy and the flag set
may get different ports given the port clash resolution, then the
clash resolution would need to unmangle packets to get them back to a
consistent configuration.

This is something that can only happen from nfqueue if any of the
multiqueues approach is used to distribute packets between several
CPUs, right?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: nfqueue & bridge netfilter considered broken
  2016-09-06 10:10       ` Pablo Neira Ayuso
@ 2016-09-06 11:24         ` Florian Westphal
  0 siblings, 0 replies; 6+ messages in thread
From: Florian Westphal @ 2016-09-06 11:24 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Fri, Sep 02, 2016 at 12:22:44PM +0200, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > On Fri, Sep 02, 2016 at 11:58:53AM +0200, Pablo Neira Ayuso wrote:
> > > > On Fri, Sep 02, 2016 at 11:08:48AM +0200, Florian Westphal wrote:
> > > > > I - discard extra nfct entry when cloning.  Works, but obviously not
> > > > >  compatible in any way (the clones are INVALID).
> > > > 
> > > > This approach is simple and it would only break when packets are
> > > > flooded to all ports, actually this is not working anyway because of
> > > > clashes at confirm, right?
> > > 
> > > Hm, what about attaching the notrack conntrack for this case?
> > 
> > This is what Patrick said last time this came up (source:
> > http://marc.info/?l=netfilter-devel&m=131471329004889&w=2 ):
> > 
> > "I don't think the clones should have invalid state, even untracked is
> > very questionable since all packets should have NAT applied to them in
> > the same way, connmarks might be used etc.

> > way would be to serialize reinjection of packets belonging to
> > unconfirmed conntracks in nf_reinject or the queueing modules. Conntrack
> > related stuff doesn't really belong there, but it seems like the easiest
> > and safest fix to me."
> > 
> > As for bridge conntrack, this is indeed a good question.
> > 
> > Seems we will need to register a dedicated conntrack bridge hook that
> > takes care of uncloning in FORWARD hook, i.e. add a hook in FORWARD
> > that makes a deep copy of all unconfirmed conntracks if skb is cloned,
> > and (once skb reaches nf_confirm) do a non-destructive clash resolution
> > (accept instead of drop of the clashing entries should be enough).
> >
> > We have to sacrifice another status bit for this, or perhaps add a
> > bridge conntrack extension to store such a clash hint though.
> 
> Assuming nf_nat_setup_info() was not yet called, ie. NAT from
> postrouting case, then these packets with a deep copy and the flag set
> may get different ports given the port clash resolution, then the
> clash resolution would need to unmangle packets to get them back to a
> consistent configuration.

Right -- one of the reasons why I did not plan on adding NAT hooks
for NFPROTO_BRIDGE ...

> This is something that can only happen from nfqueue if any of the
> multiqueues approach is used to distribute packets between several
> CPUs, right?

I think we also might have a race with local delivery/upcall vs.
flood forwarding, local_rcv path in bridge uses netif_receive_skb so skb
might be queued on percpu backlog. (the race is much more prominent
with per-cpu nfqueueing though and I don't recall seeing such race
related crashes without nfqueue presence).

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-09-06 11:24 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-02  9:08 nfqueue & bridge netfilter considered broken Florian Westphal
2016-09-02  9:58 ` Pablo Neira Ayuso
2016-09-02 10:00   ` Pablo Neira Ayuso
2016-09-02 10:22     ` Florian Westphal
2016-09-06 10:10       ` Pablo Neira Ayuso
2016-09-06 11:24         ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.