All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@toke.dk>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org, cake@lists.bufferbloat.net,
	netfilter-devel@vger.kernel.org
Subject: Re: [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier
Date: Wed, 23 May 2018 22:38:30 +0200	[thread overview]
Message-ID: <87in7exg3d.fsf@toke.dk> (raw)
In-Reply-To: <20180523.144442.864194409238516747.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Tue, 22 May 2018 15:57:38 +0200
>
>> When CAKE is deployed on a gateway that also performs NAT (which is a
>> common deployment mode), the host fairness mechanism cannot distinguish
>> internal hosts from each other, and so fails to work correctly.
>> 
>> To fix this, we add an optional NAT awareness mode, which will query the
>> kernel conntrack mechanism to obtain the pre-NAT addresses for each packet
>> and use that in the flow and host hashing.
>> 
>> When the shaper is enabled and the host is already performing NAT, the cost
>> of this lookup is negligible. However, in unlimited mode with no NAT being
>> performed, there is a significant CPU cost at higher bandwidths. For this
>> reason, the feature is turned off by default.
>> 
>> Cc: netfilter-devel@vger.kernel.org
>> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
>
> This is really pushing the limits of what a packet scheduler can
> require for correct operation.

Well, Cake is all about pushing the limits of what a packet scheduler
can do... ;)

> And this creates an incredibly ugly dependency.

Yeah, I do agree with that, and I'd love to get rid of it. I even tried
prototyping what it would take to lookup the symbols at runtime using
kallsyms. It wasn't exactly prettier; pushed it here in case anyone
wants to recoil in horror (completely untested, just got it to the point
where the module compiles with no nf_* symbols according to objdump):

https://github.com/dtaht/sch_cake/commit/97270a10dcea236d137f5113aaeb4303098ab3f3

> I'd much rather you do something NAT method agnostic, like save or
> compute the necessary information on ingress and then later use it on
> egress.

How would this work? We would have to add some kind of global state
shared between all instances of the qdisc, and maintain state for all
flows we see going through there, effectively duplicating conntrack, and
also requiring people to run Cake on all interfaces? How is that better?

> Because what you have here will completely break when someone does NAT
> using eBPF, act_nat, or similar.
>
> There is even skb->rxhash, be creative :-)

This is not actually about improving hashing; the post-NAT information
is fine for that. It's about making sure the per-host fairness works
when NATing, so we can distribute bandwidth between the hosts on the
local LAN regardless of how many flows they open. This is one of the
"killer features" of Cake - it was the top requested feature until we
implemented it. So it would be a shame to drop it.

Since act_nat is a 1-to-1 mapping I don't think we would have any loss
of functionality with that. For eBPF, well, obviously all bets are off
as far as reusing any state. But it's not unreasonable to expect people
who do NAT in eBPF to also set skb->tc_classid if they want pre-nat host
fairness, is it?

Which means that the only remaining issue is the module dependency. Can
we live with that (noting that it'll go away if conntrack is configured
out of the kernel entirely)? Or is the kallsyms approach a viable way
forward? I guess we could add a kconfig option that toggles between that
and native calls, so that we'd at least get a compile error on suitably
configured kernels if the API changes...

-Toke

  parent reply	other threads:[~2018-05-23 20:38 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-22 13:57 [PATCH net-next v15 0/7] sched: Add Common Applications Kept Enhanced (cake) qdisc Toke Høiland-Jørgensen
2018-05-22 13:57 ` [PATCH net-next v15 6/7] sch_cake: Add overhead compensation support to the rate shaper Toke Høiland-Jørgensen
2018-05-22 13:57 ` [PATCH net-next v15 3/7] sch_cake: Add optional ACK filter Toke Høiland-Jørgensen
2018-05-22 13:57 ` [PATCH net-next v15 2/7] sch_cake: Add ingress mode Toke Høiland-Jørgensen
2018-05-22 13:57 ` [PATCH net-next v15 5/7] sch_cake: Add DiffServ handling Toke Høiland-Jørgensen
2018-05-22 13:57 ` [PATCH net-next v15 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc Toke Høiland-Jørgensen
2018-05-22 13:57 ` [PATCH net-next v15 4/7] sch_cake: Add NAT awareness to packet classifier Toke Høiland-Jørgensen
2018-05-22 14:07   ` Pablo Neira Ayuso
2018-05-22 14:11     ` Toke Høiland-Jørgensen
2018-05-23 22:46       ` Pablo Neira Ayuso
2018-05-23 23:25         ` Toke Høiland-Jørgensen
2018-05-23 18:44   ` David Miller
2018-05-23 19:31     ` [Cake] " Jonathan Morton
2018-05-23 20:04       ` David Miller
2018-05-23 20:33         ` Jonathan Morton
2018-05-23 20:39           ` David Miller
2018-05-23 20:38     ` Toke Høiland-Jørgensen [this message]
2018-05-23 20:41       ` David Miller
2018-05-23 21:05         ` Toke Høiland-Jørgensen
2018-05-23 21:20           ` David Miller
2018-05-23 22:40             ` Toke Høiland-Jørgensen
2018-05-24  4:52               ` [Cake] " Kevin Darbyshire-Bryant
2018-05-22 13:57 ` [PATCH net-next v15 7/7] sch_cake: Conditionally split GSO segments Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87in7exg3d.fsf@toke.dk \
    --to=toke@toke.dk \
    --cc=cake@lists.bufferbloat.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.