All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick McHardy <kaber@trash.net>
To: David Miller <davem@davemloft.net>
Cc: pablo@netfilter.org, tgraf@suug.ch,
	netfilter-devel@vger.kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH 0/7 RFC] Netfilter/nf_tables ingress support
Date: Mon, 13 Apr 2015 21:19:14 +0100	[thread overview]
Message-ID: <20150413201913.GD20275@acer.localdomain> (raw)
In-Reply-To: <20150412.211421.1771298417488412635.davem@davemloft.net>

On 12.04, David Miller wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Fri, 10 Apr 2015 22:09:01 +0200
> 
> >> > This patchset adds the Netfilter hook at the ingress path, in a per-device
> >> > fashion. This also comes with the new nf_tables 'netdev' family support to
> >> > provide access to users to the existing nf_tables features. This includes the
> >> > transactional netlink API and the enhanced set infrastructure.  Several patches
> >> > come in first place to prepare this support, including the refactoring of
> >> > __netif_receive_skb_core() to accomodate the new hook.
> 
> It's always been the case that if you want to do L2 or lower level
> stuff, you use the ingress classifier, packet actions, and qdiscs.
> 
> If you want to do higher level things, NF hooks provide that facility.
> 
> The only example I've seen is packet counting, and one could do that
> just as easily with the ingress qdisc.
> 
> So given what I've been shown so far I'm very far from convinced that
> this new hook in an already over polluted, most critical of all
> critical code paths, is justified at all.

I'm going to jump in here since I think a good case for this can be
made. It's going to be a bit longer, sorry. You can skip to the
examples after the first three paragraphs since it's just a lot of
TC bashing :)

First, lets start with the ingress qdisc hook, since it is somewhat
related. The ingress qdisc is basically a hack to get filters to run
at ingress. There is no queue, its a workaround for the fact that
we don't have any other way to filter at ingress, and in my opinion
not a nice one.

Its is today implemented as an enqueue call in netif_receive_skb(),
but it used to register with IPv4/IPv6 netfilter to receive packets.
If it is actually used, there is full serialization since a global
lock is used. If it is not used, the cost is pretty minimal except
for the quite large amount of code.

Now, if we had a netfilter hook at ingress *without* the okfn, the
cost when disabled would simply be a static key, when enabled a
hook invocation, pretty much comparable to ingress. Given that
ingress used to be implemented on top of netfilter, there's no
reason why we couldn't do that again. I'm pretty much convinced
that we could easily make the impact smaller than it is now.

Now, given the features. Ingress filtering is mainly used for
TC actions, which are in my opinion the most horible way imaginable
to so something like that. tc is a horrible mess and tc actions
are the worst part of it. The fact that people over time have still
added some TC actions is in my opinion telling that a better
way to do this stuff would be appreciated, its hard to imagine this
being the first choice of anyone.

Looking at the TC actions, some are directly related to using
netfilter (ipt, connmark). ipt is actually dangerous in that
it can easily crash your kernel, from a quick look connmark
looks equally unsafe in that it doesn't perform even basic
header validation that conntrack relies on. We can obviously
take care of the netfilter related actions easily. skbedit is
like a very primitive meta expression from nftables, stateless
NAT and pedit are something that can easily be supported using
nftables and we need that (generic) support anyways. The VLAN
action and policing is something I'm unsure about, but all
the remaining ones we don't already support basically come for
free since we need that functionality (packet editing) anyway.

Now the advantages of being able to use nft. First, the obvious
one is that we have a nice userspace tool, a well defined
grammar, and that people would be able to use the same tool for
very similar tasks. nftables in the kernel is almost completely
lockless, we support way more possibilites already and we won't
have to add new special case TC actions anymore. Look at the
connmark action for example. It can set a value. How long until
someone wants to use a bitmask? We support all operations
(assignment, bit operations) for all types, we have sets for fast
lookups, maps for associating values quickly, we have a nice and
readable syntax and full translation back to the readable
representation and much more.

Now you asked for some examples :)

As I mentioned, we can set meta data in various ways, we will
be able to do stateless NAT and pedit, of course very flexible
classification etc etc, but some of the cool stuff which would
be useful especially in ingress:

Are you wondering why CPUs are used unbalanced and want to
see what kind of traffic is causing it? With the dynamic
expression instantiation queued in nf-next, you could do:

nft add ingress filter \
	flow table debug cpu . ether saddr counter

To dynamically create a table of counters called "debug" for
all CPU + MAC source combinations. Listing it will show you

	0 . 20:1a:06:e5:cb:be ... bytes ... packets
	0 . 9c:d2:1e:74:eb:75 ... bytes ... packets
	1 . 52:54:00:79:83:a2 ... bytes ... packets
	...

I'm currently working on various ways to sort it in every
dimension so you can easily find the information you're looking
for.

If it doesn't seem to be unevenly distributed based on MACs,
you could try IP source and IP destination address instead:

nft add ingress filter \
	flow table debug cpu . ip saddr . ip daddr counter

Which will give you a table of counters for every CPU + saddr + daddr
combination. You can combine any kind of key we support, which
are a lot. This could be a nice help for debugging performance
problems.

So, in short, I think this makes a lot of sense to have, but
in order to avoid performance impact, ingress should be moved
back to netfilter (since its the more generic approach) and
the async resumption (okfn) should be avoided completely.

  reply	other threads:[~2015-04-13 20:19 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-10 12:15 [PATCH 0/7 RFC] Netfilter/nf_tables ingress support Pablo Neira Ayuso
2015-04-10 12:15 ` [PATCH 1/7] net: refactor __netif_receive_skb_core Pablo Neira Ayuso
2015-04-10 13:47   ` Daniel Borkmann
2015-04-15 16:09     ` Jesper Dangaard Brouer
2015-04-16  5:49       ` Patrick McHardy
2015-04-10 19:56   ` Alexander Duyck
2015-04-15 12:44     ` David Laight
2015-04-15 13:28       ` Alexander Duyck
2015-04-10 12:15 ` [PATCH 2/7] netfilter: add nf_hook_list_active() Pablo Neira Ayuso
2015-04-10 12:15 ` [PATCH 3/7] netfilter: add hook list to nf_hook_state Pablo Neira Ayuso
2015-04-10 12:15 ` [PATCH 4/7] netfilter: cleanup struct nf_hook_ops struct indentation Pablo Neira Ayuso
2015-04-10 13:27   ` Sergei Shtylyov
2015-04-10 12:15 ` [PATCH 5/7] net: add netfilter ingress hook Pablo Neira Ayuso
2015-04-10 13:21   ` Thomas Graf
2015-04-10 13:36     ` Patrick McHardy
2015-04-10 20:17       ` Pablo Neira Ayuso
2015-04-10 21:33         ` Patrick McHardy
2015-04-11 12:55           ` Pablo Neira Ayuso
2015-04-11 13:06             ` Patrick McHardy
2015-04-11 13:32               ` Pablo Neira Ayuso
2015-04-10 20:08     ` Pablo Neira Ayuso
2015-04-10 12:15 ` [PATCH 6/7] netfilter: nf_tables: allow to bind table to net_device Pablo Neira Ayuso
2015-04-10 12:15 ` [PATCH 7/7] netfilter: nf_tables: add netdev table to filter from ingress Pablo Neira Ayuso
2015-04-10 13:22 ` [PATCH 0/7 RFC] Netfilter/nf_tables ingress support Thomas Graf
2015-04-10 20:09   ` Pablo Neira Ayuso
2015-04-13  1:14     ` David Miller
2015-04-13 20:19       ` Patrick McHardy [this message]
2015-04-14  9:00         ` Thomas Graf
2015-04-14  9:06           ` Patrick McHardy
2015-04-14 10:08             ` Thomas Graf
2015-04-14 10:13               ` Patrick McHardy
2015-04-14 10:32                 ` Thomas Graf
2015-04-14 20:05                   ` Jesper Dangaard Brouer
2015-04-14 12:27         ` Jamal Hadi Salim
2015-04-14 15:12           ` John Fastabend
2015-04-14 15:36             ` Alexei Starovoitov
2015-04-15  7:35               ` John Fastabend
2015-04-15  9:19                 ` Daniel Borkmann
2015-04-15 16:24                 ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150413201913.GD20275@acer.localdomain \
    --to=kaber@trash.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.