All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: netfilter-devel@vger.kernel.org
Subject: nfqueue & bridge netfilter considered broken
Date: Fri, 2 Sep 2016 11:08:48 +0200	[thread overview]
Message-ID: <20160902090848.GA506@breakpoint.cc> (raw)

Hi.

This is a note to summarize state of bridge + br_netfilter + nfqueue.

TL;DR: I am giving up.  I see no way to fix this in a sane fashion.

What I tried:
I - discard extra nfct entry when cloning.  Works, but obviously not
 compatible in any way (the clones are INVALID).

II - add locking where needed.  It gets out of hand rather quickly,
 it is also pushed on everyone else rather than just those places
 that cause clones with unconfirmed conntrack entries (only places
 that i know of is mcast routing and the bridge).

III - serialize on reinject.
 Idea was to change nfqnl_recv_verdict() to (instead of nf_reinject() )
 call a wrapper, do_nf_reinject, which essentially does:

static void do_nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)
{
#if IS_ENABLED(NF_CONNTRACK)
   struct nf_conn *ct = entry->skb->nfct;

  if (!ct || nf_ct_is_confirmed(ct)) {
       nf_reinject(entry, verdict);
       return 0;
  }

  if (skb_cloned(entry->skb)) {
      module_get(THIS_MODULE);
      push_to_work_queue_for_reinject(entry, verdict);
      return;
  }
#endif
  nf_reinject(entry, verdict);
}

The work queue then makes sure that we feed such skb one-by-one.

Aside from this being ugly as f there are two problems:
 1. skb_cloned() being true does NOT mean that we have to serialize, skb->nfct might
    be unique after all, i.e. we again end up pushing work on users/cases
    where this might not be needed at all.

 2. We can't know if all clones get queued via nfqueue, so we could have something like:
    a. clone is created
    b. this clone is sent to userspace for queueing
    c. another clone is created
    d. this clone is NOT queued to userspace and about to continue traversal
    e. at same time, userspace reinjects the older clone on other cpu
    -> we race again despite such 'serializer'.

This leaves option number IV:
 - when cloning in bridge, also make a full copy of ->nfct.
 This has 2 issues:
   1. dependency from bridge to conntrack (yuck)
   2. confirmation will find clash and toss the clones ...
    ... so we will need more clash resolution hacks (double-yuck).

At this point I am considering submitting a patch to
remove nfqueue support for bridge netfilter altogether.

             reply	other threads:[~2016-09-02  9:09 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-02  9:08 Florian Westphal [this message]
2016-09-02  9:58 ` nfqueue & bridge netfilter considered broken Pablo Neira Ayuso
2016-09-02 10:00   ` Pablo Neira Ayuso
2016-09-02 10:22     ` Florian Westphal
2016-09-06 10:10       ` Pablo Neira Ayuso
2016-09-06 11:24         ` Florian Westphal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160902090848.GA506@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.