From: Florian Westphal <fw@strlen.de>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: Florian Westphal <fw@strlen.de>,
davem@davemloft.net, netdev@vger.kernel.org,
Pablo Neira Ayuso <pablo@netfilter.org>,
Alexei Starovoitov <ast@kernel.org>
Subject: Re: [PATCH net-next] netfilter: revert introduction of egress hook
Date: Wed, 18 Mar 2020 13:33:15 +0100 [thread overview]
Message-ID: <20200318123315.GI979@breakpoint.cc> (raw)
In-Reply-To: <c7c6fb40-06f9-8078-6f76-5dc75a094e25@iogearbox.net>
Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 3/18/20 11:02 AM, Florian Westphal wrote:
> > Daniel Borkmann <daniel@iogearbox.net> wrote:
> > > This reverts the following commits:
> > >
> > > 8537f78647c0 ("netfilter: Introduce egress hook")
> > > 5418d3881e1f ("netfilter: Generalize ingress hook")
> > > b030f194aed2 ("netfilter: Rename ingress hook include file")
> > >
> > > From the discussion in [0], the author's main motivation to add a hook
> > > in fast path is for an out of tree kernel module, which is a red flag
> > > to begin with.
> >
> > The author did post patches for nftables, i.e. you can hook up rulesets to
> > this new hook point.
> >
> > > is on future extensions w/o concrete code in the tree yet. Revert as
> > > suggested [1] given the weak justification to add more hooks to critical
> > > fast-path.
> >
> > Do you have an alternative suggestion on how to expose this?
>
> Yeah, I think we should not plaster the stack with same/similar hooks that
> achieve the same functionality next to each other over and over at the cost
> of performance for users .. ideally there should just be a single entry point
> that is very lightweight/efficient when not used and can otherwise patch to
> a direct call when in use. Recent work from KP Singh goes into this direction
> with the fmodify_return work [0], so we would have a single static key which
> wraps an empty function call entry which can then be patched by the kernel at
> runtime. Inside that trampoline we can still keep the ordering intact, but
> imho this would overall better reduce overhead when functionality is not used
> compared to the practice of duplication today.
Thanks for explaining. If I understand this correctly then:
1. sch_handle_egress() becomes a non-inlined function that isn't called
from __dev_queue_xmit or any other location
2. __dev_queue_xmit calls a dummy do-nothing function wrapped in
existing egress-static-key
3. kernels sched/tc code can patch the dummy function so it calls
sch_handle_egress, without userspace changes/awareness
4. netfilter could reuse this even when tc is already patched in, so
the dummy function does two direct calls.
How does that differ from current code? One could also re-arrange
things like this (diff below, just for illustration).
The only difference I see vs. my understanding of your proposal is:
1. no additional static key, nf_hook_egress_active() doesn't exist
2. nf_hook_egress exists, but isn't called anywhere, patched-in at runtime
3. sch_handle_egress isn't called anywhere either, patched-in too
Did I get that right? The idea/plan looks good to me, it just looks
like a very marginal difference to me, thats why I'm asking.
Thanks!
diff --git a/net/core/dev.c b/net/core/dev.c
index aeb8ccbbe93b..406ac86b6d6c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3770,13 +3770,24 @@ int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
EXPORT_SYMBOL(dev_loopback_xmit);
#ifdef CONFIG_NET_EGRESS
-static struct sk_buff *
+static int nf_egress(struct sk_buff *skb)
+{
+ if (nf_hook_egress_active(skb))
+ return nf_hook_egress(skb);
+
+ return 0;
+}
+
+static noinline struct sk_buff *
sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
{
-#ifdef CONFIG_NET_CLS_ACT
- struct mini_Qdisc *miniq = rcu_dereference_bh(dev->miniq_egress);
+ struct mini_Qdisc *miniq;
struct tcf_result cl_res;
+ if (nf_egress(skb) < 0)
+ return NULL;
+
+ miniq = rcu_dereference_bh(dev->miniq_egress);
if (!miniq)
return skb;
@@ -3812,19 +3823,6 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
}
#endif /* CONFIG_NET_EGRESS */
#ifdef CONFIG_XPS
static int __get_xps_queue_idx(struct net_device *dev, struct sk_buff *skb,
struct xps_dev_maps *dev_maps, unsigned int tci)
@@ -4014,9 +4012,6 @@ static int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev)
#endif
#ifdef CONFIG_NET_EGRESS
if (static_branch_unlikely(&egress_needed_key)) {
- if (nf_egress(skb) < 0)
- goto out;
-
skb = sch_handle_egress(skb, &rc, dev);
if (!skb)
goto out;
next prev parent reply other threads:[~2020-03-18 12:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-18 9:33 [PATCH net-next] netfilter: revert introduction of egress hook Daniel Borkmann
2020-03-18 9:36 ` Pablo Neira Ayuso
2020-03-18 9:41 ` Daniel Borkmann
2020-03-18 9:51 ` Pablo Neira Ayuso
2020-03-18 10:02 ` Florian Westphal
2020-03-18 10:50 ` Daniel Borkmann
2020-03-18 12:33 ` Florian Westphal [this message]
2020-03-18 14:22 ` Daniel Borkmann
2020-03-19 1:45 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200318123315.GI979@breakpoint.cc \
--to=fw@strlen.de \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=pablo@netfilter.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).