All of lore.kernel.org
 help / color / mirror / Atom feed
* [flamebait] xdp, well meaning but pointless
@ 2016-12-01  9:11 Florian Westphal
  2016-12-01 13:42 ` Hannes Frederic Sowa
                   ` (3 more replies)
  0 siblings, 4 replies; 40+ messages in thread
From: Florian Westphal @ 2016-12-01  9:11 UTC (permalink / raw)
  To: netdev

[ As already mentioned in my reply to Tom, here is
the xdp flamebait/critique ]

Lots of XDP related patches started to appear on netdev.
I'd prefer if it would stop...

To me XDP combines all disadvantages of stack bypass solutions like dpdk
with the disadvantages of kernel programming with a more limited
instruction set and toolchain.

Unlike XDP userspace bypass (dpdk et al) allow use of any programming
model or language you want (including scripting languages), which
makes things a lot easier, e.g. garbage collection, debuggers vs.
crash+vmcore+printk...

I have heared the argument that these restrictions that come with
XDP are great because it allows to 'limit what users can do'.

Given existence of DPDK/netmap/userspace bypass is a reality, this is
a very weak argument -- why would anyone pick XDP over a dpdk/netmap
based solution?
XDP will always be less powerful and a lot more complicated,
especially considering users of dpdk (or toolkits built on top of it)
are not kernel programmers and userspace has more powerful ipc
(or storage) mechanisms.

Aside from this, XDP, like DPDK, is a kernel bypass.
You might say 'Its just stack bypass, not a kernel bypass!'.
But what does that mean exactly?  That packets can still be passed
onward to normal stack?
Bypass solutions like netmap can also inject packets back to
kernel stack again.

Running less powerful user code in a restricted environment in the kernel
address space is certainly a worse idea than separating this logic out
to user space.

In light of DPDKs existence it make a lot more sense to me to provide
a). a faster mmap based interface (possibly AF_PACKET based) that allows
to map nic directly into userspace, detaching tx/rx queue from kernel.

John Fastabend sent something like this last year as a proof of
concept, iirc it was rejected because register space got exposed directly
to userspace.  I think we should re-consider merging netmap
(or something conceptually close to its design).

b). with regards to a programmable data path: IFF one wants to do this
in kernel (and thats a big if), it seems much more preferrable to provide
a config/data-based approach rather than a programmable one.  If you want
full freedom DPDK is architecturally just too powerful to compete with.

Proponents of XDP sometimes provide usage examples.
Lets look at some of these.

== Application developement: ==
* DNS Server
data structures and algorithms need to be implemented in a mostly touring
complete language, so eBPF cannot readily be be used for that.
At least it will be orders of magnitude harder than in userspace.

* TCP Endpoint
TCP processing in eBPF is a bit out of question while userspace tcp stacks
based on both netmap and dpdk already exist today.

== Forwarding dataplane: ==

* Router/Switch
Router and switches should actually adhere to standardized and specified
protocols and thus don't need a lot of custom software and specialized
software.  Still a lot more work compared to userspace offloads where
you can do things like allocating a 4GB array to perform nexthop lookup.
Also needs ability to perform tx on another interface.

* Load balancer
State holding algorithm need sorting and searching, so also no fit for
eBPF (could be exposed by function exports, but then can we do DoS by
finding worst case scenarios?).

Also again needs way to forward frame out via another interface.

For cases where packet gets sent out via same interface it would appear
to be easier to use port mirroring in a switch and use stochastic filtering
on end nodes to determine which host should take responsibility.

XDP plus: central authority over how distribution will work in case
nodes are added/removed from pool.
But then again, it will be easier to hande this with netmap/dpdk where
more complicated scheduling algorithms can be used.

* early drop/filtering.
While its possible to do "u32" like filters with ebpf, all modern nics
support ntuple filtering in hardware, which is going to be faster because
such packet will never even be signalled to the operating system.
For more complicated cases (e.g. doing socket lookup to check if particular
packet does match bound socket (and expected sequence numbers etc) I don't
see easy ways to do that with XDP (and without sk_buff context).
Providing it via function exports is possible of course, but that will only
result in an "arms race" where we will see special-sauce functions
all over the place -- DoS will always attempt to go for something
that is difficult to filter against, cf. all the recent volume-based
floodings.

Thanks, Florian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01  9:11 [flamebait] xdp, well meaning but pointless Florian Westphal
@ 2016-12-01 13:42 ` Hannes Frederic Sowa
  2016-12-01 14:58 ` Thomas Graf
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-01 13:42 UTC (permalink / raw)
  To: Florian Westphal, netdev

On 01.12.2016 10:11, Florian Westphal wrote:
> [ As already mentioned in my reply to Tom, here is
> the xdp flamebait/critique ]
> 
> Lots of XDP related patches started to appear on netdev.
> I'd prefer if it would stop...

I discussed this with Florian and helped with the text. I want to
mention this to express my full support for this.

Thanks,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01  9:11 [flamebait] xdp, well meaning but pointless Florian Westphal
  2016-12-01 13:42 ` Hannes Frederic Sowa
@ 2016-12-01 14:58 ` Thomas Graf
  2016-12-01 15:52   ` Hannes Frederic Sowa
                     ` (2 more replies)
       [not found] ` <CALx6S35R_ZStV=DbD-7Gf_y5xXqQq113_6m5p-p0GQfv46v0Ow@mail.gmail.com>
  2016-12-02 17:22 ` Jesper Dangaard Brouer
  3 siblings, 3 replies; 40+ messages in thread
From: Thomas Graf @ 2016-12-01 14:58 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev

On 12/01/16 at 10:11am, Florian Westphal wrote:
> Aside from this, XDP, like DPDK, is a kernel bypass.
> You might say 'Its just stack bypass, not a kernel bypass!'.
> But what does that mean exactly?  That packets can still be passed
> onward to normal stack?
> Bypass solutions like netmap can also inject packets back to
> kernel stack again.

I have a fundamental issue with the approach of exporting packets into
user space and reinjecting them: Once the packet leaves the kernel,
any security guarantees are off. I have no control over what is
running in user space and whether whatever listener up there has been
compromised or not. To me, that's a no go, in particular for servers
hosting multi tenant workloads. This is one of the main reasons why
XDP, in particular in combination with BPF, is very interesting to me.

> b). with regards to a programmable data path: IFF one wants to do this
> in kernel (and thats a big if), it seems much more preferrable to provide
> a config/data-based approach rather than a programmable one.  If you want
> full freedom DPDK is architecturally just too powerful to compete with.

I must have missed the legal disclaimer that is usually put in front
of the DPDK marketing show :-)

I don't want full freedom. I want programmability with stack integration
at sufficient speed and the ability to benefit from the hardware
abstractions that the kernel provides.

> Proponents of XDP sometimes provide usage examples.
> Lets look at some of these.

[ I won't comment on any of the other use cases because they are of no
  interest to me ]

> * Load balancer
> State holding algorithm need sorting and searching, so also no fit for
> eBPF (could be exposed by function exports, but then can we do DoS by
> finding worst case scenarios?).
> 
> Also again needs way to forward frame out via another interface.
> 
> For cases where packet gets sent out via same interface it would appear
> to be easier to use port mirroring in a switch and use stochastic filtering
> on end nodes to determine which host should take responsibility.
> 
> XDP plus: central authority over how distribution will work in case
> nodes are added/removed from pool.
> But then again, it will be easier to hande this with netmap/dpdk where
> more complicated scheduling algorithms can be used.

I agree with you if the LB is a software based appliance in either a
dedicated VM or on dedicated baremetal.

The reality is turning out to be different in many cases though, LB
needs to be performed not only for north south but east west as well.
So even if I would handle LB for traffic entering my datacenter in user
space, I will need the same LB for packets from my applications and
I definitely don't want to move all of that into user space.

> * early drop/filtering.
> While its possible to do "u32" like filters with ebpf, all modern nics
> support ntuple filtering in hardware, which is going to be faster because
> such packet will never even be signalled to the operating system.
> For more complicated cases (e.g. doing socket lookup to check if particular
> packet does match bound socket (and expected sequence numbers etc) I don't
> see easy ways to do that with XDP (and without sk_buff context).
> Providing it via function exports is possible of course, but that will only
> result in an "arms race" where we will see special-sauce functions
> all over the place -- DoS will always attempt to go for something
> that is difficult to filter against, cf. all the recent volume-based
> floodings.

You probably put this last because this was the most difficult to
shoot down ;-)

The benefits of XDP for this use case are extremely obvious in combination
with local applications which need to be protected. ntuple filters won't
cut it. They are limited and subject to a certain rate at which they
can be configured. Any serious mitigation will require stateful filtering
with at least minimal L7 matching abilities and this is exactly where XDP
will excel.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 14:58 ` Thomas Graf
@ 2016-12-01 15:52   ` Hannes Frederic Sowa
  2016-12-01 16:28     ` Thomas Graf
  2016-12-01 16:06   ` [flamebait] xdp, well meaning but pointless Florian Westphal
  2016-12-01 16:19   ` David Miller
  2 siblings, 1 reply; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-01 15:52 UTC (permalink / raw)
  To: Thomas Graf, Florian Westphal; +Cc: netdev

Hi,

On 01.12.2016 15:58, Thomas Graf wrote:
> On 12/01/16 at 10:11am, Florian Westphal wrote:
>> Aside from this, XDP, like DPDK, is a kernel bypass.
>> You might say 'Its just stack bypass, not a kernel bypass!'.
>> But what does that mean exactly?  That packets can still be passed
>> onward to normal stack?
>> Bypass solutions like netmap can also inject packets back to
>> kernel stack again.
> 
> I have a fundamental issue with the approach of exporting packets into
> user space and reinjecting them: Once the packet leaves the kernel,
> any security guarantees are off. I have no control over what is
> running in user space and whether whatever listener up there has been
> compromised or not. To me, that's a no go, in particular for servers
> hosting multi tenant workloads. This is one of the main reasons why
> XDP, in particular in combination with BPF, is very interesting to me.

First of all, this is a rant targeted at XDP and not at eBPF as a whole.
XDP manipulates packets at free will and thus all security guarantees
are off as well as in any user space solution.

Secondly user space provides policy, acl, more controlled memory
protection, restartability and better debugability. If I had multi
tenant workloads I would definitely put more complex "business/acl"
logic into user space, so I can make use of LSM and other features to
especially prevent a network facing service to attack the tenants. If
stuff gets put into the kernel you run user controlled code in the
kernel exposing a much bigger attack vector.

What use case do you see in XDP specifically e.g. for container networking?

>> b). with regards to a programmable data path: IFF one wants to do this
>> in kernel (and thats a big if), it seems much more preferrable to provide
>> a config/data-based approach rather than a programmable one.  If you want
>> full freedom DPDK is architecturally just too powerful to compete with.
> 
> I must have missed the legal disclaimer that is usually put in front
> of the DPDK marketing show :-)
>
> I don't want full freedom. I want programmability with stack integration
> at sufficient speed and the ability to benefit from the hardware
> abstractions that the kernel provides.
> 
>> Proponents of XDP sometimes provide usage examples.
>> Lets look at some of these.
> 
> [ I won't comment on any of the other use cases because they are of no
>   interest to me ]
> 
>> * Load balancer
>> State holding algorithm need sorting and searching, so also no fit for
>> eBPF (could be exposed by function exports, but then can we do DoS by
>> finding worst case scenarios?).
>>
>> Also again needs way to forward frame out via another interface.
>>
>> For cases where packet gets sent out via same interface it would appear
>> to be easier to use port mirroring in a switch and use stochastic filtering
>> on end nodes to determine which host should take responsibility.
>>
>> XDP plus: central authority over how distribution will work in case
>> nodes are added/removed from pool.
>> But then again, it will be easier to hande this with netmap/dpdk where
>> more complicated scheduling algorithms can be used.
> 
> I agree with you if the LB is a software based appliance in either a
> dedicated VM or on dedicated baremetal.
> 
> The reality is turning out to be different in many cases though, LB
> needs to be performed not only for north south but east west as well.
> So even if I would handle LB for traffic entering my datacenter in user
> space, I will need the same LB for packets from my applications and
> I definitely don't want to move all of that into user space.

The open question to me is why is programmability needed here.

Look at the discussion about ECMP and consistent hashing. It is not very
easy to actually write this code correctly. Why can't we just put C code
into the kernel that implements this once and for all and let user space
update the policies?

Load balancers have to deal correctly with ICMP packets, e.g. they even
have to be duplicated to every ECMP route. This seems to be problematic
to do in eBPF programs due to looping constructs so you end up with
complicated user space anyway.

>> * early drop/filtering.
>> While its possible to do "u32" like filters with ebpf, all modern nics
>> support ntuple filtering in hardware, which is going to be faster because
>> such packet will never even be signalled to the operating system.
>> For more complicated cases (e.g. doing socket lookup to check if particular
>> packet does match bound socket (and expected sequence numbers etc) I don't
>> see easy ways to do that with XDP (and without sk_buff context).
>> Providing it via function exports is possible of course, but that will only
>> result in an "arms race" where we will see special-sauce functions
>> all over the place -- DoS will always attempt to go for something
>> that is difficult to filter against, cf. all the recent volume-based
>> floodings.
> 
> You probably put this last because this was the most difficult to
> shoot down ;-)
> 
> The benefits of XDP for this use case are extremely obvious in combination
> with local applications which need to be protected. ntuple filters won't
> cut it. They are limited and subject to a certain rate at which they
> can be configured. Any serious mitigation will require stateful filtering
> with at least minimal L7 matching abilities and this is exactly where XDP
> will excel.

In my experience and research of DoS attacks you certainly want to put a
bit more logic into a filter than to look up something from hash tables
and drop it then. You certainly also want to have some more logic than
32 * 4096 instructions to execute, e.g. parsing and matching of DNS/NTP
packets with certain conditions and side look-ups. If you seriously do
that stuff you end up with a highly optimized programs containing
stochastic filters and also complex database logic.

If I want to drop based on hash table lookups, as Florian wrote, I would
let the hardware do that and assemble the tables in user space.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 14:58 ` Thomas Graf
  2016-12-01 15:52   ` Hannes Frederic Sowa
@ 2016-12-01 16:06   ` Florian Westphal
  2016-12-01 16:19   ` David Miller
  2 siblings, 0 replies; 40+ messages in thread
From: Florian Westphal @ 2016-12-01 16:06 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Florian Westphal, netdev

Thomas Graf <tgraf@suug.ch> wrote:
> On 12/01/16 at 10:11am, Florian Westphal wrote:
> > Aside from this, XDP, like DPDK, is a kernel bypass.
> > You might say 'Its just stack bypass, not a kernel bypass!'.
> > But what does that mean exactly?  That packets can still be passed
> > onward to normal stack?
> > Bypass solutions like netmap can also inject packets back to
> > kernel stack again.
> 
> I have a fundamental issue with the approach of exporting packets into
> user space and reinjecting them: Once the packet leaves the kernel,
> any security guarantees are off. I have no control over what is
> running in user space and whether whatever listener up there has been
> compromised or not. To me, that's a no go, in particular for servers
> hosting multi tenant workloads. This is one of the main reasons why
> XDP, in particular in combination with BPF, is very interesting to me.

Funny, I see it exactly the other way around :)

To me packet coming from this "userspace injection" is no different than
a tun/tap, or any other packet coming from network.

I see no change or increase in attack surface.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 14:58 ` Thomas Graf
  2016-12-01 15:52   ` Hannes Frederic Sowa
  2016-12-01 16:06   ` [flamebait] xdp, well meaning but pointless Florian Westphal
@ 2016-12-01 16:19   ` David Miller
  2016-12-01 16:51     ` Florian Westphal
  2016-12-01 17:20     ` Hannes Frederic Sowa
  2 siblings, 2 replies; 40+ messages in thread
From: David Miller @ 2016-12-01 16:19 UTC (permalink / raw)
  To: tgraf; +Cc: fw, netdev

From: Thomas Graf <tgraf@suug.ch>
Date: Thu, 1 Dec 2016 15:58:34 +0100

> The benefits of XDP for this use case are extremely obvious in combination
> with local applications which need to be protected. ntuple filters won't
> cut it. They are limited and subject to a certain rate at which they
> can be configured. Any serious mitigation will require stateful filtering
> with at least minimal L7 matching abilities and this is exactly where XDP
> will excel.

+1

Saying that ntuple filters can handle the early drop use case doesn't
take into consideration the nature of the tables (hundreds of
thousands of "evil" IP addresses), whether hardware can actually
handle that (it can't), and whether simple IP address matching is the
full extent of it (it isn't).

Most of the time when I hear anti-XDP rhetoric, it's usually comes
from a crowd who for some reason feels threatened by the technology
and what it might replace and make useless.

That to me says that we are _exactly_ going down the right path.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 15:52   ` Hannes Frederic Sowa
@ 2016-12-01 16:28     ` Thomas Graf
  2016-12-01 20:44       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 40+ messages in thread
From: Thomas Graf @ 2016-12-01 16:28 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Florian Westphal, netdev

On 12/01/16 at 04:52pm, Hannes Frederic Sowa wrote:
> First of all, this is a rant targeted at XDP and not at eBPF as a whole.
> XDP manipulates packets at free will and thus all security guarantees
> are off as well as in any user space solution.
> 
> Secondly user space provides policy, acl, more controlled memory
> protection, restartability and better debugability. If I had multi
> tenant workloads I would definitely put more complex "business/acl"
> logic into user space, so I can make use of LSM and other features to
> especially prevent a network facing service to attack the tenants. If
> stuff gets put into the kernel you run user controlled code in the
> kernel exposing a much bigger attack vector.
> 
> What use case do you see in XDP specifically e.g. for container networking?

DDOS mitigation to protect distributed applications in large clusters.
Relying on CDN works to protect API gateways and frontends (as long as
they don't throw you out of their network) but offers no protection
beyond that, e.g. a noisy/hostile neighbour. Doing this at the server
level and allowing the mitigation capability to scale up with the number
of servers is natural and cheap.

> > I agree with you if the LB is a software based appliance in either a
> > dedicated VM or on dedicated baremetal.
> > 
> > The reality is turning out to be different in many cases though, LB
> > needs to be performed not only for north south but east west as well.
> > So even if I would handle LB for traffic entering my datacenter in user
> > space, I will need the same LB for packets from my applications and
> > I definitely don't want to move all of that into user space.
> 
> The open question to me is why is programmability needed here.
> 
> Look at the discussion about ECMP and consistent hashing. It is not very
> easy to actually write this code correctly. Why can't we just put C code
> into the kernel that implements this once and for all and let user space
> update the policies?

Whatever LB logic is put in place with native C code now is unlikely the
logic we need in two years. We can't really predict the future. If it
was the case, networking would have been done long ago and we would all
be working on self eating ice cream now.

> Load balancers have to deal correctly with ICMP packets, e.g. they even
> have to be duplicated to every ECMP route. This seems to be problematic
> to do in eBPF programs due to looping constructs so you end up with
> complicated user space anyway.

Feel free to implement such complex LBs in user space or natively. It is
not required for the majority of use cases. The most popular LBs for
application load balancing have no idea of ECMP and require ECMP aware
routers to be made redundant itself.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 16:19   ` David Miller
@ 2016-12-01 16:51     ` Florian Westphal
  2016-12-01 17:20     ` Hannes Frederic Sowa
  1 sibling, 0 replies; 40+ messages in thread
From: Florian Westphal @ 2016-12-01 16:51 UTC (permalink / raw)
  To: David Miller; +Cc: tgraf, fw, netdev

David Miller <davem@davemloft.net> wrote:
> Saying that ntuple filters can handle the early drop use case doesn't
> take into consideration the nature of the tables (hundreds of
> thousands of "evil" IP addresses),

Thats not what I said.

But Ok, message received. I rest my case.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 16:19   ` David Miller
  2016-12-01 16:51     ` Florian Westphal
@ 2016-12-01 17:20     ` Hannes Frederic Sowa
  1 sibling, 0 replies; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-01 17:20 UTC (permalink / raw)
  To: David Miller, tgraf; +Cc: fw, netdev

On 01.12.2016 17:19, David Miller wrote:
> Saying that ntuple filters can handle the early drop use case doesn't
> take into consideration the nature of the tables (hundreds of
> thousands of "evil" IP addresses), whether hardware can actually
> handle that (it can't), and whether simple IP address matching is the
> full extent of it (it isn't).

Yes, that is why you certainly use ntuple filters in combination with
some kind of high level business logic in user space.

I have to check but am pretty sure you can't even do the simplest thing
in XDP, parsing the apexes of DNS packets and checking them against a
hash table, because the program won't pass the verifier.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
       [not found] ` <CALx6S35R_ZStV=DbD-7Gf_y5xXqQq113_6m5p-p0GQfv46v0Ow@mail.gmail.com>
@ 2016-12-01 18:02   ` Tom Herbert
  0 siblings, 0 replies; 40+ messages in thread
From: Tom Herbert @ 2016-12-01 18:02 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Linux Kernel Network Developers

On Thu, Dec 1, 2016 at 10:01 AM, Tom Herbert <tom@herbertland.com> wrote:
>
>
> On Thu, Dec 1, 2016 at 1:11 AM, Florian Westphal <fw@strlen.de> wrote:
>>
>> [ As already mentioned in my reply to Tom, here is
>> the xdp flamebait/critique ]
>>
>> Lots of XDP related patches started to appear on netdev.
>> I'd prefer if it would stop...
>>
>> To me XDP combines all disadvantages of stack bypass solutions like dpdk
>> with the disadvantages of kernel programming with a more limited
>> instruction set and toolchain.
>>
>> Unlike XDP userspace bypass (dpdk et al) allow use of any programming
>> model or language you want (including scripting languages), which
>> makes things a lot easier, e.g. garbage collection, debuggers vs.
>> crash+vmcore+printk...
>>
>> I have heared the argument that these restrictions that come with
>> XDP are great because it allows to 'limit what users can do'.
>>
>> Given existence of DPDK/netmap/userspace bypass is a reality, this is
>> a very weak argument -- why would anyone pick XDP over a dpdk/netmap
>> based solution?
>
Because, we've seen time an time again that attempts to bypass the
stack and run parallel stacks under the banner of "the kernel is too
slow" does not scale for large deployment. We've seen this with RDMA,
TOE, OpenOnload, and we'll see this for DPDK, FD.io, VPP and whatever
else people are going to dream up. If I have a couple hundred machines
running a single application like the HFT guys do, then sure I'd
probably look into such solutions. But when I have datacenters with
100Ks running an assortment of applications even contemplating the
possibility of deploying a parallel stacks gives me headache. We need
to consider an seemingly endless list of security issues,
manageability. robustness, protocol compatibility, etc. I really have
little interest in bringing a huge pile of 3rd party code that I have
to support, and I definitely have no interest in constantly replacing
all of my hardware to get the latest and greatest support for these
offloads as vendors leak them out. Given a choice between buying into
some kernel bypass solution versus hacking Linux a little bit to carve
out an accelerated data path to address the "kernel is too slow"
argument, I will choose the latter any day of the week.

Tom

>
> Because, we've seen time an time again that attempts to bypass the stack and
> run parallel stacks under the banner of "the kernel is too slow" does not
> scale for large deployment. We've seen this with RDMA, TOE, OpenOnload, and
> we'll see this for DPDK, FD.io, VPP and whatever else people are going to
> dream up. If I have a couple hundred machines running a single application
> like the HFT guys do, then sure I'd probably look into such solutions. But
> when I have datacenters with 100Ks running an assortment of applications
> even contemplating the possibility of deploying a parallel stacks gives me
> headache. We need to consider an seemingly endless list of security issues,
> manageability. robustness, protocol compatibility, etc. I really have little
> interest in bringing a huge pile of 3rd party code that I have to support,
> and I definitely have no interest in constantly replacing all of my hardware
> to get the latest and greatest support for these offloads as vendors leak
> them out. Given a choice between buying into some kernel bypass solution
> versus hacking Linux a little bit to carve out an accelerated data path to
> address the "kernel is too slow" argument, I will choose the latter any day
> of the week.
>
> Tom
>
>> XDP will always be less powerful and a lot more complicated,
>> especially considering users of dpdk (or toolkits built on top of it)
>> are not kernel programmers and userspace has more powerful ipc
>> (or storage) mechanisms.
>>
>> Aside from this, XDP, like DPDK, is a kernel bypass.
>> You might say 'Its just stack bypass, not a kernel bypass!'.
>> But what does that mean exactly?  That packets can still be passed
>> onward to normal stack?
>> Bypass solutions like netmap can also inject packets back to
>> kernel stack again.
>>
>> Running less powerful user code in a restricted environment in the kernel
>> address space is certainly a worse idea than separating this logic out
>> to user space.
>>
>> In light of DPDKs existence it make a lot more sense to me to provide
>> a). a faster mmap based interface (possibly AF_PACKET based) that allows
>> to map nic directly into userspace, detaching tx/rx queue from kernel.
>>
>> John Fastabend sent something like this last year as a proof of
>> concept, iirc it was rejected because register space got exposed directly
>> to userspace.  I think we should re-consider merging netmap
>> (or something conceptually close to its design).
>>
>> b). with regards to a programmable data path: IFF one wants to do this
>> in kernel (and thats a big if), it seems much more preferrable to provide
>> a config/data-based approach rather than a programmable one.  If you want
>> full freedom DPDK is architecturally just too powerful to compete with.
>>
>> Proponents of XDP sometimes provide usage examples.
>> Lets look at some of these.
>>
>> == Application developement: ==
>> * DNS Server
>> data structures and algorithms need to be implemented in a mostly touring
>> complete language, so eBPF cannot readily be be used for that.
>> At least it will be orders of magnitude harder than in userspace.
>>
>> * TCP Endpoint
>> TCP processing in eBPF is a bit out of question while userspace tcp stacks
>> based on both netmap and dpdk already exist today.
>>
>> == Forwarding dataplane: ==
>>
>> * Router/Switch
>> Router and switches should actually adhere to standardized and specified
>> protocols and thus don't need a lot of custom software and specialized
>> software.  Still a lot more work compared to userspace offloads where
>> you can do things like allocating a 4GB array to perform nexthop lookup.
>> Also needs ability to perform tx on another interface.
>>
>> * Load balancer
>> State holding algorithm need sorting and searching, so also no fit for
>> eBPF (could be exposed by function exports, but then can we do DoS by
>> finding worst case scenarios?).
>>
>> Also again needs way to forward frame out via another interface.
>>
>> For cases where packet gets sent out via same interface it would appear
>> to be easier to use port mirroring in a switch and use stochastic
>> filtering
>> on end nodes to determine which host should take responsibility.
>>
>> XDP plus: central authority over how distribution will work in case
>> nodes are added/removed from pool.
>> But then again, it will be easier to hande this with netmap/dpdk where
>> more complicated scheduling algorithms can be used.
>>
>> * early drop/filtering.
>> While its possible to do "u32" like filters with ebpf, all modern nics
>> support ntuple filtering in hardware, which is going to be faster because
>> such packet will never even be signalled to the operating system.
>> For more complicated cases (e.g. doing socket lookup to check if
>> particular
>> packet does match bound socket (and expected sequence numbers etc) I don't
>> see easy ways to do that with XDP (and without sk_buff context).
>> Providing it via function exports is possible of course, but that will
>> only
>> result in an "arms race" where we will see special-sauce functions
>> all over the place -- DoS will always attempt to go for something
>> that is difficult to filter against, cf. all the recent volume-based
>> floodings.
>>
>> Thanks, Florian
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 16:28     ` Thomas Graf
@ 2016-12-01 20:44       ` Hannes Frederic Sowa
  2016-12-01 21:12         ` Tom Herbert
  0 siblings, 1 reply; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-01 20:44 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Florian Westphal, netdev

Hello,

this is a good conversation and I simply want to bring my worries
across. I don't have good solutions for the problems XDP tries to solve
but I fear we could get caught up in maintenance problems in the long
term given the ideas floating around on how to evolve XDP currently.

On 01.12.2016 17:28, Thomas Graf wrote:
> On 12/01/16 at 04:52pm, Hannes Frederic Sowa wrote:
>> First of all, this is a rant targeted at XDP and not at eBPF as a whole.
>> XDP manipulates packets at free will and thus all security guarantees
>> are off as well as in any user space solution.
>>
>> Secondly user space provides policy, acl, more controlled memory
>> protection, restartability and better debugability. If I had multi
>> tenant workloads I would definitely put more complex "business/acl"
>> logic into user space, so I can make use of LSM and other features to
>> especially prevent a network facing service to attack the tenants. If
>> stuff gets put into the kernel you run user controlled code in the
>> kernel exposing a much bigger attack vector.
>>
>> What use case do you see in XDP specifically e.g. for container networking?
> 
> DDOS mitigation to protect distributed applications in large clusters.
> Relying on CDN works to protect API gateways and frontends (as long as
> they don't throw you out of their network) but offers no protection
> beyond that, e.g. a noisy/hostile neighbour. Doing this at the server
> level and allowing the mitigation capability to scale up with the number
> of servers is natural and cheap.

So far we e.g. always considered L2 attacks a problem of the network
admin to correctly protect the environment. Are you talking about
protecting the L3 data plane? Are there custom proprietary protocols in
place which need custom protocol parsers that need involvement of the
kernel before it could verify the packet?

In the past we tried to protect the L3 data plane as good as we can in
Linux to allow the plain old server admin to set an IP address on an
interface and install whatever software in user space. We try not only
to protect it but also try to achieve fairness by adding a lot of
counters everywhere. Are protections missing right now or are we talking
about better performance?

To provide fairness you often have to share validated data within the
kernel and with XDP. This requires consistent lookup methods for sockets
in the lower level. Those can be exported to XDP via external functions
and become part of uAPI which will limit our ability to change those
functions in future. When the discussion started about early demuxing in
XDP I became really nervous, because suddenly the XDP program has to
decide correctly which protocol type it has and look in the correct
socket table for the socket. Different semantics for sockets can apply
here, e.g. some sockets are RCU managed, some end up using reference
counts. A wrong decision here would cause havoc in the kernel (XDP
considers packet as UDP but kernel stack as TCP). Also, who knows that
we won't have per-cpu socket tables we would keep that as uAPI (this is
btw. the dragonflyBSD approach to scaling)? Imagine someone writing a
SIP rewriter in XDP and depending on a coherent view of all sockets even
if their hash doesn't fit to the one of the queue? Suddenly something
which was thought of as being only mutable by one CPU becomes global
again and because of XDP we need to add locking because of uAPI.

This discussion is parallel to the discussion about trace points, which
are not considered uAPI. If eBPF functions are not considered uAPI then
eBPF in the network stack will have much less value, because you
suddenly depend on specific kernel versions again and cannot simply load
the code into the kernel. The API checks will become very difficult to
implement, see also the ongoing MODVERSIONS discussions on LKML some
days back.

>>> I agree with you if the LB is a software based appliance in either a
>>> dedicated VM or on dedicated baremetal.
>>>
>>> The reality is turning out to be different in many cases though, LB
>>> needs to be performed not only for north south but east west as well.
>>> So even if I would handle LB for traffic entering my datacenter in user
>>> space, I will need the same LB for packets from my applications and
>>> I definitely don't want to move all of that into user space.
>>
>> The open question to me is why is programmability needed here.
>>
>> Look at the discussion about ECMP and consistent hashing. It is not very
>> easy to actually write this code correctly. Why can't we just put C code
>> into the kernel that implements this once and for all and let user space
>> update the policies?
> 
> Whatever LB logic is put in place with native C code now is unlikely the
> logic we need in two years. We can't really predict the future. If it
> was the case, networking would have been done long ago and we would all
> be working on self eating ice cream now.

Did LB algorithms on the networking layer change that much?

There is a long history of using consistent hashing for load balancing,
as e.g. is done in haproxy or F5.

>> Load balancers have to deal correctly with ICMP packets, e.g. they even
>> have to be duplicated to every ECMP route. This seems to be problematic
>> to do in eBPF programs due to looping constructs so you end up with
>> complicated user space anyway.
> 
> Feel free to implement such complex LBs in user space or natively. It is
> not required for the majority of use cases. The most popular LBs for
> application load balancing have no idea of ECMP and require ECMP aware
> routers to be made redundant itself.

They are already available and e.g. deployed as part of some kubernetes
stacks as I wrote above.

It is a generally available algorithm which fits a lot of use cases,
basically every website that wants to shard its sessions can make use of
it. Also it is independent of ECMP and mostly is implemented in load
balancers due to its need for a lot of memory.

New algorithms outdate old ones but the core principles will be the same
and don't require major changes to the interface, e.g. ipvs scheduler.

If we are talking about security features for early drop inside TCP
streams, like http, you need to have a proper stream reassembly engine.
Snort e.g. dropped a complete stream of TCP packets if you send a RST
with the same quadruple but a wrong sequence number. End system didn't
consider the RST but non synchronized solutions ended up not inspecting
this flow anymore. How do you handle diverting views on meta data in
networking protocols? Also look how hard it is to keep e.g. the fib
table synchronized to the hardware.

In retrospect, I think Tom Herbert's move putting ILA stateless
translation into the XDP hook wasn't that bad after all. ILA maybe
hopefully becomes a standard and its implementation is already in the
kernel so why keep its translator not part of the kernel, too?

TLDR; what I'm trying to argue is that evolution of the network stack is
problematic with a programmable backplane in the kernel which locks out
future modifications of the stack in some places. On the other side, if
we don't add those features we will have a half baked solution and
people will simply prefer netmap or DPDK.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 20:44       ` Hannes Frederic Sowa
@ 2016-12-01 21:12         ` Tom Herbert
  2016-12-01 21:27           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 40+ messages in thread
From: Tom Herbert @ 2016-12-01 21:12 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Thomas Graf, Florian Westphal, Linux Kernel Network Developers

On Thu, Dec 1, 2016 at 12:44 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hello,
>
> this is a good conversation and I simply want to bring my worries
> across. I don't have good solutions for the problems XDP tries to solve
> but I fear we could get caught up in maintenance problems in the long
> term given the ideas floating around on how to evolve XDP currently.
>
> On 01.12.2016 17:28, Thomas Graf wrote:
>> On 12/01/16 at 04:52pm, Hannes Frederic Sowa wrote:
>>> First of all, this is a rant targeted at XDP and not at eBPF as a whole.
>>> XDP manipulates packets at free will and thus all security guarantees
>>> are off as well as in any user space solution.
>>>
>>> Secondly user space provides policy, acl, more controlled memory
>>> protection, restartability and better debugability. If I had multi
>>> tenant workloads I would definitely put more complex "business/acl"
>>> logic into user space, so I can make use of LSM and other features to
>>> especially prevent a network facing service to attack the tenants. If
>>> stuff gets put into the kernel you run user controlled code in the
>>> kernel exposing a much bigger attack vector.
>>>
>>> What use case do you see in XDP specifically e.g. for container networking?
>>
>> DDOS mitigation to protect distributed applications in large clusters.
>> Relying on CDN works to protect API gateways and frontends (as long as
>> they don't throw you out of their network) but offers no protection
>> beyond that, e.g. a noisy/hostile neighbour. Doing this at the server
>> level and allowing the mitigation capability to scale up with the number
>> of servers is natural and cheap.
>
> So far we e.g. always considered L2 attacks a problem of the network
> admin to correctly protect the environment. Are you talking about
> protecting the L3 data plane? Are there custom proprietary protocols in
> place which need custom protocol parsers that need involvement of the
> kernel before it could verify the packet?
>
> In the past we tried to protect the L3 data plane as good as we can in
> Linux to allow the plain old server admin to set an IP address on an
> interface and install whatever software in user space. We try not only
> to protect it but also try to achieve fairness by adding a lot of
> counters everywhere. Are protections missing right now or are we talking
> about better performance?
>
The technical plenary at last IETF on Seoul a couple of weeks ago was
exclusively focussed on DDOS in light of the recent attack against
Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
presentation by Nick Sullivan
(https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
alluded to some implementation of DDOS mitigation. In particular, on
slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"
numbers he gave we're based in iptables+BPF and that was a whole
1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
and that's also when I introduced XDP to whole IETF :-) ). If that's
the best we can do the Internet is in a world hurt. DDOS mitigation
alone is probably a sufficient motivation to look at XDP. We need
something that drops bad packets as quickly as possible when under
attack, we need this to be integrated into the stack, we need it to be
programmable to deal with the increasing savvy of attackers, and we
don't want to be forced to be dependent on HW solutions. This is why
we created XDP!

Tom

> To provide fairness you often have to share validated data within the
> kernel and with XDP. This requires consistent lookup methods for sockets
> in the lower level. Those can be exported to XDP via external functions
> and become part of uAPI which will limit our ability to change those
> functions in future. When the discussion started about early demuxing in
> XDP I became really nervous, because suddenly the XDP program has to
> decide correctly which protocol type it has and look in the correct
> socket table for the socket. Different semantics for sockets can apply
> here, e.g. some sockets are RCU managed, some end up using reference
> counts. A wrong decision here would cause havoc in the kernel (XDP
> considers packet as UDP but kernel stack as TCP). Also, who knows that
> we won't have per-cpu socket tables we would keep that as uAPI (this is
> btw. the dragonflyBSD approach to scaling)? Imagine someone writing a
> SIP rewriter in XDP and depending on a coherent view of all sockets even
> if their hash doesn't fit to the one of the queue? Suddenly something
> which was thought of as being only mutable by one CPU becomes global
> again and because of XDP we need to add locking because of uAPI.
>
> This discussion is parallel to the discussion about trace points, which
> are not considered uAPI. If eBPF functions are not considered uAPI then
> eBPF in the network stack will have much less value, because you
> suddenly depend on specific kernel versions again and cannot simply load
> the code into the kernel. The API checks will become very difficult to
> implement, see also the ongoing MODVERSIONS discussions on LKML some
> days back.
>
>>>> I agree with you if the LB is a software based appliance in either a
>>>> dedicated VM or on dedicated baremetal.
>>>>
>>>> The reality is turning out to be different in many cases though, LB
>>>> needs to be performed not only for north south but east west as well.
>>>> So even if I would handle LB for traffic entering my datacenter in user
>>>> space, I will need the same LB for packets from my applications and
>>>> I definitely don't want to move all of that into user space.
>>>
>>> The open question to me is why is programmability needed here.
>>>
>>> Look at the discussion about ECMP and consistent hashing. It is not very
>>> easy to actually write this code correctly. Why can't we just put C code
>>> into the kernel that implements this once and for all and let user space
>>> update the policies?
>>
>> Whatever LB logic is put in place with native C code now is unlikely the
>> logic we need in two years. We can't really predict the future. If it
>> was the case, networking would have been done long ago and we would all
>> be working on self eating ice cream now.
>
> Did LB algorithms on the networking layer change that much?
>
> There is a long history of using consistent hashing for load balancing,
> as e.g. is done in haproxy or F5.
>
>>> Load balancers have to deal correctly with ICMP packets, e.g. they even
>>> have to be duplicated to every ECMP route. This seems to be problematic
>>> to do in eBPF programs due to looping constructs so you end up with
>>> complicated user space anyway.
>>
>> Feel free to implement such complex LBs in user space or natively. It is
>> not required for the majority of use cases. The most popular LBs for
>> application load balancing have no idea of ECMP and require ECMP aware
>> routers to be made redundant itself.
>
> They are already available and e.g. deployed as part of some kubernetes
> stacks as I wrote above.
>
> It is a generally available algorithm which fits a lot of use cases,
> basically every website that wants to shard its sessions can make use of
> it. Also it is independent of ECMP and mostly is implemented in load
> balancers due to its need for a lot of memory.
>
> New algorithms outdate old ones but the core principles will be the same
> and don't require major changes to the interface, e.g. ipvs scheduler.
>
> If we are talking about security features for early drop inside TCP
> streams, like http, you need to have a proper stream reassembly engine.
> Snort e.g. dropped a complete stream of TCP packets if you send a RST
> with the same quadruple but a wrong sequence number. End system didn't
> consider the RST but non synchronized solutions ended up not inspecting
> this flow anymore. How do you handle diverting views on meta data in
> networking protocols? Also look how hard it is to keep e.g. the fib
> table synchronized to the hardware.
>
> In retrospect, I think Tom Herbert's move putting ILA stateless
> translation into the XDP hook wasn't that bad after all. ILA maybe
> hopefully becomes a standard and its implementation is already in the
> kernel so why keep its translator not part of the kernel, too?
>
> TLDR; what I'm trying to argue is that evolution of the network stack is
> problematic with a programmable backplane in the kernel which locks out
> future modifications of the stack in some places. On the other side, if
> we don't add those features we will have a half baked solution and
> people will simply prefer netmap or DPDK.
>
> Bye,
> Hannes
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 21:12         ` Tom Herbert
@ 2016-12-01 21:27           ` Hannes Frederic Sowa
  2016-12-01 21:51             ` Tom Herbert
  2016-12-02 18:39             ` bpf bounded loops. Was: [flamebait] xdp Alexei Starovoitov
  0 siblings, 2 replies; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-01 21:27 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Thomas Graf, Florian Westphal, Linux Kernel Network Developers

On 01.12.2016 22:12, Tom Herbert wrote:
> On Thu, Dec 1, 2016 at 12:44 PM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> Hello,
>>
>> this is a good conversation and I simply want to bring my worries
>> across. I don't have good solutions for the problems XDP tries to solve
>> but I fear we could get caught up in maintenance problems in the long
>> term given the ideas floating around on how to evolve XDP currently.
>>
>> On 01.12.2016 17:28, Thomas Graf wrote:
>>> On 12/01/16 at 04:52pm, Hannes Frederic Sowa wrote:
>>>> First of all, this is a rant targeted at XDP and not at eBPF as a whole.
>>>> XDP manipulates packets at free will and thus all security guarantees
>>>> are off as well as in any user space solution.
>>>>
>>>> Secondly user space provides policy, acl, more controlled memory
>>>> protection, restartability and better debugability. If I had multi
>>>> tenant workloads I would definitely put more complex "business/acl"
>>>> logic into user space, so I can make use of LSM and other features to
>>>> especially prevent a network facing service to attack the tenants. If
>>>> stuff gets put into the kernel you run user controlled code in the
>>>> kernel exposing a much bigger attack vector.
>>>>
>>>> What use case do you see in XDP specifically e.g. for container networking?
>>>
>>> DDOS mitigation to protect distributed applications in large clusters.
>>> Relying on CDN works to protect API gateways and frontends (as long as
>>> they don't throw you out of their network) but offers no protection
>>> beyond that, e.g. a noisy/hostile neighbour. Doing this at the server
>>> level and allowing the mitigation capability to scale up with the number
>>> of servers is natural and cheap.
>>
>> So far we e.g. always considered L2 attacks a problem of the network
>> admin to correctly protect the environment. Are you talking about
>> protecting the L3 data plane? Are there custom proprietary protocols in
>> place which need custom protocol parsers that need involvement of the
>> kernel before it could verify the packet?
>>
>> In the past we tried to protect the L3 data plane as good as we can in
>> Linux to allow the plain old server admin to set an IP address on an
>> interface and install whatever software in user space. We try not only
>> to protect it but also try to achieve fairness by adding a lot of
>> counters everywhere. Are protections missing right now or are we talking
>> about better performance?
>>
> The technical plenary at last IETF on Seoul a couple of weeks ago was
> exclusively focussed on DDOS in light of the recent attack against
> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
> presentation by Nick Sullivan
> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
> alluded to some implementation of DDOS mitigation. In particular, on
> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"
> numbers he gave we're based in iptables+BPF and that was a whole
> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
> and that's also when I introduced XDP to whole IETF :-) ). If that's
> the best we can do the Internet is in a world hurt. DDOS mitigation
> alone is probably a sufficient motivation to look at XDP. We need
> something that drops bad packets as quickly as possible when under
> attack, we need this to be integrated into the stack, we need it to be
> programmable to deal with the increasing savvy of attackers, and we
> don't want to be forced to be dependent on HW solutions. This is why
> we created XDP!

I totally understand that. But in my reply to David in this thread I
mentioned DNS apex processing as being problematic which is actually
being referred in your linked slide deck on page 9 ("What do floods look
like") and the problematic of parsing DNS packets in XDP due to string
processing and looping inside eBPF.

Not to mention the fact that you might have to deal with fragments in
the Internet. Some DOS mitigations were already abused to generate
blackholes for other users. Filtering such stuff is quite complicated.

I argued also under the aspect of what Thomas said, that the outside
world of the cluster is already protected by a CDN.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 21:27           ` Hannes Frederic Sowa
@ 2016-12-01 21:51             ` Tom Herbert
  2016-12-02 10:24               ` Jesper Dangaard Brouer
  2016-12-02 18:39             ` bpf bounded loops. Was: [flamebait] xdp Alexei Starovoitov
  1 sibling, 1 reply; 40+ messages in thread
From: Tom Herbert @ 2016-12-01 21:51 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Thomas Graf, Florian Westphal, Linux Kernel Network Developers

On Thu, Dec 1, 2016 at 1:27 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 01.12.2016 22:12, Tom Herbert wrote:
>> On Thu, Dec 1, 2016 at 12:44 PM, Hannes Frederic Sowa
>> <hannes@stressinduktion.org> wrote:
>>> Hello,
>>>
>>> this is a good conversation and I simply want to bring my worries
>>> across. I don't have good solutions for the problems XDP tries to solve
>>> but I fear we could get caught up in maintenance problems in the long
>>> term given the ideas floating around on how to evolve XDP currently.
>>>
>>> On 01.12.2016 17:28, Thomas Graf wrote:
>>>> On 12/01/16 at 04:52pm, Hannes Frederic Sowa wrote:
>>>>> First of all, this is a rant targeted at XDP and not at eBPF as a whole.
>>>>> XDP manipulates packets at free will and thus all security guarantees
>>>>> are off as well as in any user space solution.
>>>>>
>>>>> Secondly user space provides policy, acl, more controlled memory
>>>>> protection, restartability and better debugability. If I had multi
>>>>> tenant workloads I would definitely put more complex "business/acl"
>>>>> logic into user space, so I can make use of LSM and other features to
>>>>> especially prevent a network facing service to attack the tenants. If
>>>>> stuff gets put into the kernel you run user controlled code in the
>>>>> kernel exposing a much bigger attack vector.
>>>>>
>>>>> What use case do you see in XDP specifically e.g. for container networking?
>>>>
>>>> DDOS mitigation to protect distributed applications in large clusters.
>>>> Relying on CDN works to protect API gateways and frontends (as long as
>>>> they don't throw you out of their network) but offers no protection
>>>> beyond that, e.g. a noisy/hostile neighbour. Doing this at the server
>>>> level and allowing the mitigation capability to scale up with the number
>>>> of servers is natural and cheap.
>>>
>>> So far we e.g. always considered L2 attacks a problem of the network
>>> admin to correctly protect the environment. Are you talking about
>>> protecting the L3 data plane? Are there custom proprietary protocols in
>>> place which need custom protocol parsers that need involvement of the
>>> kernel before it could verify the packet?
>>>
>>> In the past we tried to protect the L3 data plane as good as we can in
>>> Linux to allow the plain old server admin to set an IP address on an
>>> interface and install whatever software in user space. We try not only
>>> to protect it but also try to achieve fairness by adding a lot of
>>> counters everywhere. Are protections missing right now or are we talking
>>> about better performance?
>>>
>> The technical plenary at last IETF on Seoul a couple of weeks ago was
>> exclusively focussed on DDOS in light of the recent attack against
>> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
>> presentation by Nick Sullivan
>> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
>> alluded to some implementation of DDOS mitigation. In particular, on
>> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"
>> numbers he gave we're based in iptables+BPF and that was a whole
>> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
>> and that's also when I introduced XDP to whole IETF :-) ). If that's
>> the best we can do the Internet is in a world hurt. DDOS mitigation
>> alone is probably a sufficient motivation to look at XDP. We need
>> something that drops bad packets as quickly as possible when under
>> attack, we need this to be integrated into the stack, we need it to be
>> programmable to deal with the increasing savvy of attackers, and we
>> don't want to be forced to be dependent on HW solutions. This is why
>> we created XDP!
>
> I totally understand that. But in my reply to David in this thread I
> mentioned DNS apex processing as being problematic which is actually
> being referred in your linked slide deck on page 9 ("What do floods look
> like") and the problematic of parsing DNS packets in XDP due to string
> processing and looping inside eBPF.
>
I agree that eBPF is not going to be sufficient from everything we'll
want to do. Undoubtably, we'll continue see new addition of more
helpers to assist in processing, but at some point we will want a to
load a kernel module that handles more complex processing and insert
it at the XDP callout. Nothing in the design of XDP precludes doing
that and I have already posted the patches to generalize the XDP
callout for that. Taking either of these routes has tradeoffs, but
regardless of whether this is BPF or module code, the principles of
XDP and its value to help solve some class of problems remains.

 Tom

> Not to mention the fact that you might have to deal with fragments in
> the Internet. Some DOS mitigations were already abused to generate
> blackholes for other users. Filtering such stuff is quite complicated.
>
> I argued also under the aspect of what Thomas said, that the outside
> world of the cluster is already protected by a CDN.
>
> Bye,
> Hannes
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01 21:51             ` Tom Herbert
@ 2016-12-02 10:24               ` Jesper Dangaard Brouer
  2016-12-02 11:54                 ` Hannes Frederic Sowa
  0 siblings, 1 reply; 40+ messages in thread
From: Jesper Dangaard Brouer @ 2016-12-02 10:24 UTC (permalink / raw)
  To: Tom Herbert
  Cc: brouer, Hannes Frederic Sowa, Thomas Graf, Florian Westphal,
	Linux Kernel Network Developers

On Thu, 1 Dec 2016 13:51:32 -0800
Tom Herbert <tom@herbertland.com> wrote:

> >> The technical plenary at last IETF on Seoul a couple of weeks ago was
> >> exclusively focussed on DDOS in light of the recent attack against
> >> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
> >> presentation by Nick Sullivan
> >> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
> >> alluded to some implementation of DDOS mitigation. In particular, on
> >> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"

slide 14

> >> numbers he gave we're based in iptables+BPF and that was a whole
> >> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
> >> and that's also when I introduced XDP to whole IETF :-) ). If that's
> >> the best we can do the Internet is in a world hurt. DDOS mitigation
> >> alone is probably a sufficient motivation to look at XDP. We need
> >> something that drops bad packets as quickly as possible when under
> >> attack, we need this to be integrated into the stack, we need it to be
> >> programmable to deal with the increasing savvy of attackers, and we
> >> don't want to be forced to be dependent on HW solutions. This is why
> >> we created XDP!  

The 1.2Mpps number is a bit low, but we are unfortunately in that
ballpark.

> > I totally understand that. But in my reply to David in this thread I
> > mentioned DNS apex processing as being problematic which is actually
> > being referred in your linked slide deck on page 9 ("What do floods look
> > like") and the problematic of parsing DNS packets in XDP due to string
> > processing and looping inside eBPF.

That is a weak argument. You do realize CloudFlare actually use eBPF to
do this exact filtering, and (so-far) eBPF for parsing DNS have been
sufficient for them.

> I agree that eBPF is not going to be sufficient from everything we'll
> want to do. Undoubtably, we'll continue see new addition of more
> helpers to assist in processing, but at some point we will want a to
> load a kernel module that handles more complex processing and insert
> it at the XDP callout. Nothing in the design of XDP precludes doing
> that and I have already posted the patches to generalize the XDP
> callout for that. Taking either of these routes has tradeoffs, but
> regardless of whether this is BPF or module code, the principles of
> XDP and its value to help solve some class of problems remains.

As I've said before, I do support Tom's patches for a more generic XDP
hook that the kernel itself can use.  The first thing I would implement
with this is a fast-path for Linux L2 bridging (do depend on multiport
TX support). It would be so easy to speedup bridging, XDP would only
need to forward packets already in the bridge-FIB table, rest is
XDP_PASS to normal stack and bridge code (timers etc).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-02 10:24               ` Jesper Dangaard Brouer
@ 2016-12-02 11:54                 ` Hannes Frederic Sowa
  2016-12-02 16:59                   ` Tom Herbert
  0 siblings, 1 reply; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-02 11:54 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Tom Herbert
  Cc: Thomas Graf, Florian Westphal, Linux Kernel Network Developers

On 02.12.2016 11:24, Jesper Dangaard Brouer wrote:
> On Thu, 1 Dec 2016 13:51:32 -0800
> Tom Herbert <tom@herbertland.com> wrote:
> 
>>>> The technical plenary at last IETF on Seoul a couple of weeks ago was
>>>> exclusively focussed on DDOS in light of the recent attack against
>>>> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
>>>> presentation by Nick Sullivan
>>>> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
>>>> alluded to some implementation of DDOS mitigation. In particular, on
>>>> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"
> 
> slide 14
> 
>>>> numbers he gave we're based in iptables+BPF and that was a whole
>>>> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
>>>> and that's also when I introduced XDP to whole IETF :-) ). If that's
>>>> the best we can do the Internet is in a world hurt. DDOS mitigation
>>>> alone is probably a sufficient motivation to look at XDP. We need
>>>> something that drops bad packets as quickly as possible when under
>>>> attack, we need this to be integrated into the stack, we need it to be
>>>> programmable to deal with the increasing savvy of attackers, and we
>>>> don't want to be forced to be dependent on HW solutions. This is why
>>>> we created XDP!  
> 
> The 1.2Mpps number is a bit low, but we are unfortunately in that
> ballpark.
> 
>>> I totally understand that. But in my reply to David in this thread I
>>> mentioned DNS apex processing as being problematic which is actually
>>> being referred in your linked slide deck on page 9 ("What do floods look
>>> like") and the problematic of parsing DNS packets in XDP due to string
>>> processing and looping inside eBPF.
> 
> That is a weak argument. You do realize CloudFlare actually use eBPF to
> do this exact filtering, and (so-far) eBPF for parsing DNS have been
> sufficient for them.

You are talking about this code on the following slides (I actually
transcribed it for you here and disassembled):

l0:	ld #0x14
l1:	ldxb 4*([0]&0xf)
l2:	add x
l3:	tax
l4:	ld [x+0]
l5:	jeq #0x7657861, l6, l13
l6:	ld [x+4]
l7:	jeq #0x6d706c65, l8, l13
l8:	ld [x+8]
l9:	jeq #0x3636f6d, l10, l13
l10:	ldb [x+12]
l11:	jeq #0, l12, l13
l12:	ret #0x1
l13:	ret #0

You can offload this to u32 in hardware if that is what you want.

The reason this works is because of netfilter, which allows them to
dynamically generate BPF programs and insert and delete them from
chains, do intersection or unions of them.

If you have a freestanding program like in XDP the complexity space is a
different one and not comparable to this at all.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-02 11:54                 ` Hannes Frederic Sowa
@ 2016-12-02 16:59                   ` Tom Herbert
  2016-12-02 18:12                     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 40+ messages in thread
From: Tom Herbert @ 2016-12-02 16:59 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jesper Dangaard Brouer, Thomas Graf, Florian Westphal,
	Linux Kernel Network Developers

On Fri, Dec 2, 2016 at 3:54 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 02.12.2016 11:24, Jesper Dangaard Brouer wrote:
>> On Thu, 1 Dec 2016 13:51:32 -0800
>> Tom Herbert <tom@herbertland.com> wrote:
>>
>>>>> The technical plenary at last IETF on Seoul a couple of weeks ago was
>>>>> exclusively focussed on DDOS in light of the recent attack against
>>>>> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
>>>>> presentation by Nick Sullivan
>>>>> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
>>>>> alluded to some implementation of DDOS mitigation. In particular, on
>>>>> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"
>>
>> slide 14
>>
>>>>> numbers he gave we're based in iptables+BPF and that was a whole
>>>>> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
>>>>> and that's also when I introduced XDP to whole IETF :-) ). If that's
>>>>> the best we can do the Internet is in a world hurt. DDOS mitigation
>>>>> alone is probably a sufficient motivation to look at XDP. We need
>>>>> something that drops bad packets as quickly as possible when under
>>>>> attack, we need this to be integrated into the stack, we need it to be
>>>>> programmable to deal with the increasing savvy of attackers, and we
>>>>> don't want to be forced to be dependent on HW solutions. This is why
>>>>> we created XDP!
>>
>> The 1.2Mpps number is a bit low, but we are unfortunately in that
>> ballpark.
>>
>>>> I totally understand that. But in my reply to David in this thread I
>>>> mentioned DNS apex processing as being problematic which is actually
>>>> being referred in your linked slide deck on page 9 ("What do floods look
>>>> like") and the problematic of parsing DNS packets in XDP due to string
>>>> processing and looping inside eBPF.
>>
>> That is a weak argument. You do realize CloudFlare actually use eBPF to
>> do this exact filtering, and (so-far) eBPF for parsing DNS have been
>> sufficient for them.
>
> You are talking about this code on the following slides (I actually
> transcribed it for you here and disassembled):
>
> l0:     ld #0x14
> l1:     ldxb 4*([0]&0xf)
> l2:     add x
> l3:     tax
> l4:     ld [x+0]
> l5:     jeq #0x7657861, l6, l13
> l6:     ld [x+4]
> l7:     jeq #0x6d706c65, l8, l13
> l8:     ld [x+8]
> l9:     jeq #0x3636f6d, l10, l13
> l10:    ldb [x+12]
> l11:    jeq #0, l12, l13
> l12:    ret #0x1
> l13:    ret #0
>
> You can offload this to u32 in hardware if that is what you want.
>
> The reason this works is because of netfilter, which allows them to
> dynamically generate BPF programs and insert and delete them from
> chains, do intersection or unions of them.
>
> If you have a freestanding program like in XDP the complexity space is a
> different one and not comparable to this at all.
>
I don't understand this comment about complexity especially in regards
to the idea of offloading u32 to hardware. Relying on hardware to do
anything always leads to more complexity than an equivalent SW
implementation for the same functionality. The only reason we ever use
a hardware mechanisms is if it gives *significantly* better
performance. If the performance difference isn't there then doing
things in SW is going to be the better path (as we see in XDP).

Tom

> Bye,
> Hannes
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-01  9:11 [flamebait] xdp, well meaning but pointless Florian Westphal
                   ` (2 preceding siblings ...)
       [not found] ` <CALx6S35R_ZStV=DbD-7Gf_y5xXqQq113_6m5p-p0GQfv46v0Ow@mail.gmail.com>
@ 2016-12-02 17:22 ` Jesper Dangaard Brouer
  2016-12-03 16:19   ` Willem de Bruijn
  3 siblings, 1 reply; 40+ messages in thread
From: Jesper Dangaard Brouer @ 2016-12-02 17:22 UTC (permalink / raw)
  To: Florian Westphal; +Cc: brouer, netdev


On Thu, 1 Dec 2016 10:11:08 +0100 Florian Westphal <fw@strlen.de> wrote:

> In light of DPDKs existence it make a lot more sense to me to provide
> a). a faster mmap based interface (possibly AF_PACKET based) that allows
> to map nic directly into userspace, detaching tx/rx queue from kernel.
> 
> John Fastabend sent something like this last year as a proof of
> concept, iirc it was rejected because register space got exposed directly
> to userspace.  I think we should re-consider merging netmap
> (or something conceptually close to its design).

I'm actually working in this direction, of zero-copy RX mapping packets
into userspace.  This work is mostly related to page_pool, and I only
plan to use XDP as a filter for selecting packets going to userspace,
as this choice need to be taken very early.

My design is here:
 https://prototype-kernel.readthedocs.io/en/latest/vm/page_pool/design/memory_model_nic.html

This is mostly about changing the memory model in the drivers, to allow
for safely mapping pages to userspace.  (An efficient queue mechanism is
not covered).  People often overlook that netmap's efficiency *also* comes
from introducing pre-mapping memory/pages to userspace.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-02 16:59                   ` Tom Herbert
@ 2016-12-02 18:12                     ` Hannes Frederic Sowa
  2016-12-02 19:56                       ` Stephen Hemminger
  0 siblings, 1 reply; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-02 18:12 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jesper Dangaard Brouer, Thomas Graf, Florian Westphal,
	Linux Kernel Network Developers

On 02.12.2016 17:59, Tom Herbert wrote:
> On Fri, Dec 2, 2016 at 3:54 AM, Hannes Frederic Sowa
> <hannes@stressinduktion.org> wrote:
>> On 02.12.2016 11:24, Jesper Dangaard Brouer wrote:
>>> On Thu, 1 Dec 2016 13:51:32 -0800
>>> Tom Herbert <tom@herbertland.com> wrote:
>>>
>>>>>> The technical plenary at last IETF on Seoul a couple of weeks ago was
>>>>>> exclusively focussed on DDOS in light of the recent attack against
>>>>>> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
>>>>>> presentation by Nick Sullivan
>>>>>> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
>>>>>> alluded to some implementation of DDOS mitigation. In particular, on
>>>>>> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"
>>>
>>> slide 14
>>>
>>>>>> numbers he gave we're based in iptables+BPF and that was a whole
>>>>>> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
>>>>>> and that's also when I introduced XDP to whole IETF :-) ). If that's
>>>>>> the best we can do the Internet is in a world hurt. DDOS mitigation
>>>>>> alone is probably a sufficient motivation to look at XDP. We need
>>>>>> something that drops bad packets as quickly as possible when under
>>>>>> attack, we need this to be integrated into the stack, we need it to be
>>>>>> programmable to deal with the increasing savvy of attackers, and we
>>>>>> don't want to be forced to be dependent on HW solutions. This is why
>>>>>> we created XDP!
>>>
>>> The 1.2Mpps number is a bit low, but we are unfortunately in that
>>> ballpark.
>>>
>>>>> I totally understand that. But in my reply to David in this thread I
>>>>> mentioned DNS apex processing as being problematic which is actually
>>>>> being referred in your linked slide deck on page 9 ("What do floods look
>>>>> like") and the problematic of parsing DNS packets in XDP due to string
>>>>> processing and looping inside eBPF.
>>>
>>> That is a weak argument. You do realize CloudFlare actually use eBPF to
>>> do this exact filtering, and (so-far) eBPF for parsing DNS have been
>>> sufficient for them.
>>
>> You are talking about this code on the following slides (I actually
>> transcribed it for you here and disassembled):
>>
>> l0:     ld #0x14
>> l1:     ldxb 4*([0]&0xf)
>> l2:     add x
>> l3:     tax
>> l4:     ld [x+0]
>> l5:     jeq #0x7657861, l6, l13
>> l6:     ld [x+4]
>> l7:     jeq #0x6d706c65, l8, l13
>> l8:     ld [x+8]
>> l9:     jeq #0x3636f6d, l10, l13
>> l10:    ldb [x+12]
>> l11:    jeq #0, l12, l13
>> l12:    ret #0x1
>> l13:    ret #0
>>
>> You can offload this to u32 in hardware if that is what you want.
>>
>> The reason this works is because of netfilter, which allows them to
>> dynamically generate BPF programs and insert and delete them from
>> chains, do intersection or unions of them.
>>
>> If you have a freestanding program like in XDP the complexity space is a
>> different one and not comparable to this at all.
>>
> I don't understand this comment about complexity especially in regards
> to the idea of offloading u32 to hardware. Relying on hardware to do
> anything always leads to more complexity than an equivalent SW
> implementation for the same functionality. The only reason we ever use
> a hardware mechanisms is if it gives *significantly* better
> performance. If the performance difference isn't there then doing
> things in SW is going to be the better path (as we see in XDP).

I am just wondering why the u32 filter wasn't mentioned in their slide
deck. If all what Cloudflare needs are those kind of matches, they are
in fact actually easier to generate than an cBPF program. It is not a
good example of how a real world DoS filter in XDP would look like.

If you argue XDP as a C function hook that can call arbitrary code in
the driver before submitting that to the networking stack, yep, that is
not complex at all. Depending on how those modules will be maintained,
they either end up in the kernel and will be updated on major changes or
are 3rd party and people have to update them and also depend on the
driver features.

But this opens up a whole new can of worms also. I haven't really
thought this through completely, but last time the patches were nack'ed
with lots of strong opinions and I tended to agree with them. I am
revisiting this position.

Certainly you can build real-world DoS protection with this function
pointer hook and C code in the driver. In this case a user space
solution still has advantages because of maintainability, as e.g. with
netmap or dpdk you are again decoupled from the in-kernel API/ABI and
don't need to test, recompile etc. on each kernel upgrade. If the module
ends up in the kernel, those problems might also disappear.

For XDP+eBPF to provide a full DoS mitigation (protocol parsing,
sampling and dropping) solution seems to be too complex for me because
of the arguments I stated in my previous mail.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* bpf bounded loops. Was: [flamebait] xdp
  2016-12-01 21:27           ` Hannes Frederic Sowa
  2016-12-01 21:51             ` Tom Herbert
@ 2016-12-02 18:39             ` Alexei Starovoitov
  2016-12-02 19:25               ` Hannes Frederic Sowa
  1 sibling, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2016-12-02 18:39 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On Thu, Dec 01, 2016 at 10:27:12PM +0100, Hannes Frederic Sowa wrote:
> like") and the problematic of parsing DNS packets in XDP due to string
> processing and looping inside eBPF.

Hannes,
Not too long ago you proposed a very interesting idea to add
support for bounded loops without adding any new bpf instructions and
changing llvm (which was way better than my 'rep' like instructions
I was experimenting with). I thought systemtap guys also wanted bounded
loops and you were cooperating on the design, so I gave up on my work and
was expecting an imminent patch from you. I guess it sounds like you know
believe that bounded loops are impossible or I misunderstand your statement ?

As far as pattern search for DNS packets...
it was requested by Cloudflare guys back in March:
https://github.com/iovisor/bcc/issues/471
and it is useful for several tracing use cases as well.
Unfortunately no one had time to implement it yet.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-02 18:39             ` bpf bounded loops. Was: [flamebait] xdp Alexei Starovoitov
@ 2016-12-02 19:25               ` Hannes Frederic Sowa
  2016-12-02 19:42                 ` John Fastabend
                                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-02 19:25 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

Hi,

On 02.12.2016 19:39, Alexei Starovoitov wrote:
> On Thu, Dec 01, 2016 at 10:27:12PM +0100, Hannes Frederic Sowa wrote:
>> like") and the problematic of parsing DNS packets in XDP due to string
>> processing and looping inside eBPF.
> 
> Hannes,
> Not too long ago you proposed a very interesting idea to add
> support for bounded loops without adding any new bpf instructions and
> changing llvm (which was way better than my 'rep' like instructions
> I was experimenting with). I thought systemtap guys also wanted bounded
> loops and you were cooperating on the design, so I gave up on my work and
> was expecting an imminent patch from you. I guess it sounds like you know
> believe that bounded loops are impossible or I misunderstand your statement ?

Your argument was that it would need a new verifier as the current first
pass checks that we indeed can lay out the basic blocks as a DAG which
the second pass depends on. This would be violated.

Because eBPF is available by non privileged users this would need a lot
of effort to rewrite and verify (or indeed keep two verifiers in the
kernel for priv and non-priv). The verifier itself is exposed to
unprivileged users.

Also, by design, if we keep the current limits, this would not give you
more instructions to operate on compared to the flattened version of the
program, it would merely reduce the numbers of optimizations in LLVM
that let the verifier reject the program.

Only enabling the relaxed verifier for root users seemed thus being
problematic as programs wouldn't be portable between nonprivileged and
privileged users.

> As far as pattern search for DNS packets...
> it was requested by Cloudflare guys back in March:
> https://github.com/iovisor/bcc/issues/471
> and it is useful for several tracing use cases as well.
> Unfortunately no one had time to implement it yet.

The string operations you proposed on the other hand, which would count
as one eBPF instructions, would give a lot more flexibility and allow
more cycles to burn, but don't help parsing binary protocols like IPv6
extension headers.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-02 19:25               ` Hannes Frederic Sowa
@ 2016-12-02 19:42                 ` John Fastabend
  2016-12-02 19:50                   ` Hannes Frederic Sowa
  2016-12-03  0:20                   ` Alexei Starovoitov
  2016-12-02 19:42                 ` Hannes Frederic Sowa
  2016-12-05 16:40                 ` Edward Cree
  2 siblings, 2 replies; 40+ messages in thread
From: John Fastabend @ 2016-12-02 19:42 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On 16-12-02 11:25 AM, Hannes Frederic Sowa wrote:
> Hi,
> 
> On 02.12.2016 19:39, Alexei Starovoitov wrote:
>> On Thu, Dec 01, 2016 at 10:27:12PM +0100, Hannes Frederic Sowa wrote:
>>> like") and the problematic of parsing DNS packets in XDP due to string
>>> processing and looping inside eBPF.
>>
>> Hannes,
>> Not too long ago you proposed a very interesting idea to add
>> support for bounded loops without adding any new bpf instructions and
>> changing llvm (which was way better than my 'rep' like instructions
>> I was experimenting with). I thought systemtap guys also wanted bounded
>> loops and you were cooperating on the design, so I gave up on my work and
>> was expecting an imminent patch from you. I guess it sounds like you know
>> believe that bounded loops are impossible or I misunderstand your statement ?
> 
> Your argument was that it would need a new verifier as the current first
> pass checks that we indeed can lay out the basic blocks as a DAG which
> the second pass depends on. This would be violated.
> 
> Because eBPF is available by non privileged users this would need a lot
> of effort to rewrite and verify (or indeed keep two verifiers in the
> kernel for priv and non-priv). The verifier itself is exposed to
> unprivileged users.

I missed this. Why the need for two verifiers?

> 
> Also, by design, if we keep the current limits, this would not give you
> more instructions to operate on compared to the flattened version of the
> program, it would merely reduce the numbers of optimizations in LLVM
> that let the verifier reject the program.
> 
> Only enabling the relaxed verifier for root users seemed thus being
> problematic as programs wouldn't be portable between nonprivileged and
> privileged users.

Still a bit lost what does the relaxed verifier provide here?

> 
>> As far as pattern search for DNS packets...
>> it was requested by Cloudflare guys back in March:
>> https://github.com/iovisor/bcc/issues/471
>> and it is useful for several tracing use cases as well.
>> Unfortunately no one had time to implement it yet.
> 
> The string operations you proposed on the other hand, which would count
> as one eBPF instructions, would give a lot more flexibility and allow
> more cycles to burn, but don't help parsing binary protocols like IPv6
> extension headers.

My rough thinking on this was the verifier had to start looking for loop
invariants and to guarantee termination. Sounds scary in general but
LLVM could put these in some normal form for us and the verifier could
only accept decreasing loops, the invariants could be required to be
integers, etc. By simplifying the loop enough the problem becomes
tractable.

I think this would be better than new instructions and/or multiple
verifiers.

.John

> 
> Bye,
> Hannes
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-02 19:25               ` Hannes Frederic Sowa
  2016-12-02 19:42                 ` John Fastabend
@ 2016-12-02 19:42                 ` Hannes Frederic Sowa
  2016-12-02 23:34                   ` Alexei Starovoitov
  2016-12-05 16:40                 ` Edward Cree
  2 siblings, 1 reply; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-02 19:42 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On Fri, Dec 2, 2016, at 20:25, Hannes Frederic Sowa wrote:
> On 02.12.2016 19:39, Alexei Starovoitov wrote:
> > On Thu, Dec 01, 2016 at 10:27:12PM +0100, Hannes Frederic Sowa wrote:
> >> like") and the problematic of parsing DNS packets in XDP due to string
> >> processing and looping inside eBPF.
> > 
> > Hannes,
> > Not too long ago you proposed a very interesting idea to add
> > support for bounded loops without adding any new bpf instructions and
> > changing llvm (which was way better than my 'rep' like instructions
> > I was experimenting with). I thought systemtap guys also wanted bounded
> > loops and you were cooperating on the design, so I gave up on my work and
> > was expecting an imminent patch from you. I guess it sounds like you know
> > believe that bounded loops are impossible or I misunderstand your statement ?
> 
> Your argument was that it would need a new verifier as the current first
> pass checks that we indeed can lay out the basic blocks as a DAG which
> the second pass depends on. This would be violated.
> 
> Because eBPF is available by non privileged users this would need a lot
> of effort to rewrite and verify (or indeed keep two verifiers in the
> kernel for priv and non-priv). The verifier itself is exposed to
> unprivileged users.
> 
> Also, by design, if we keep the current limits, this would not give you
> more instructions to operate on compared to the flattened version of the
> program, it would merely reduce the numbers of optimizations in LLVM
> that let the verifier reject the program.
> 
> Only enabling the relaxed verifier for root users seemed thus being
> problematic as programs wouldn't be portable between nonprivileged and
> privileged users.

Quick addendum:

The only solution to protect the verifier, which I saw, would be to
limit it by time and space, thus making loading of eBPF programs
depending on how fast and hot (thermal throttling) one CPU thread is.

Those are the complexity problems I am talking and concerned about.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-02 19:42                 ` John Fastabend
@ 2016-12-02 19:50                   ` Hannes Frederic Sowa
  2016-12-03  0:20                   ` Alexei Starovoitov
  1 sibling, 0 replies; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-02 19:50 UTC (permalink / raw)
  To: John Fastabend, Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On Fri, Dec 2, 2016, at 20:42, John Fastabend wrote:
> On 16-12-02 11:25 AM, Hannes Frederic Sowa wrote:
> > Hi,
> > 
> > On 02.12.2016 19:39, Alexei Starovoitov wrote:
> >> On Thu, Dec 01, 2016 at 10:27:12PM +0100, Hannes Frederic Sowa wrote:
> >>> like") and the problematic of parsing DNS packets in XDP due to string
> >>> processing and looping inside eBPF.
> >>
> >> Hannes,
> >> Not too long ago you proposed a very interesting idea to add
> >> support for bounded loops without adding any new bpf instructions and
> >> changing llvm (which was way better than my 'rep' like instructions
> >> I was experimenting with). I thought systemtap guys also wanted bounded
> >> loops and you were cooperating on the design, so I gave up on my work and
> >> was expecting an imminent patch from you. I guess it sounds like you know
> >> believe that bounded loops are impossible or I misunderstand your statement ?
> > 
> > Your argument was that it would need a new verifier as the current first
> > pass checks that we indeed can lay out the basic blocks as a DAG which
> > the second pass depends on. This would be violated.
> > 
> > Because eBPF is available by non privileged users this would need a lot
> > of effort to rewrite and verify (or indeed keep two verifiers in the
> > kernel for priv and non-priv). The verifier itself is exposed to
> > unprivileged users.
> 
> I missed this. Why the need for two verifiers?

Because of my fear that a more complex verifier will fail to provide the
same security guarantees than the old one, which already is relatively
complex.

> > Also, by design, if we keep the current limits, this would not give you
> > more instructions to operate on compared to the flattened version of the
> > program, it would merely reduce the numbers of optimizations in LLVM
> > that let the verifier reject the program.
> > 
> > Only enabling the relaxed verifier for root users seemed thus being
> > problematic as programs wouldn't be portable between nonprivileged and
> > privileged users.
> 
> Still a bit lost what does the relaxed verifier provide here?

It would allow a new instruction that is able to jump backwards. Ideally
it would be one verifier that allows this instruction and inserts the
counting logic in the BPF program.

> >> As far as pattern search for DNS packets...
> >> it was requested by Cloudflare guys back in March:
> >> https://github.com/iovisor/bcc/issues/471
> >> and it is useful for several tracing use cases as well.
> >> Unfortunately no one had time to implement it yet.
> > 
> > The string operations you proposed on the other hand, which would count
> > as one eBPF instructions, would give a lot more flexibility and allow
> > more cycles to burn, but don't help parsing binary protocols like IPv6
> > extension headers.
> 
> My rough thinking on this was the verifier had to start looking for loop
> invariants and to guarantee termination. Sounds scary in general but
> LLVM could put these in some normal form for us and the verifier could
> only accept decreasing loops, the invariants could be required to be
> integers, etc. By simplifying the loop enough the problem becomes
> tractable.

Which wouldn't buy more than LLVM simply unrolling everything, no?
Otherwise a lot of optimizations passes need to be touched. Alexei, what
do you think of this idea?

> I think this would be better than new instructions and/or multiple
> verifiers.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-02 18:12                     ` Hannes Frederic Sowa
@ 2016-12-02 19:56                       ` Stephen Hemminger
  2016-12-02 20:19                         ` Tom Herbert
  0 siblings, 1 reply; 40+ messages in thread
From: Stephen Hemminger @ 2016-12-02 19:56 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Tom Herbert, Jesper Dangaard Brouer, Thomas Graf,
	Florian Westphal, Linux Kernel Network Developers

On Fri, 2 Dec 2016 19:12:00 +0100
Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:

> On 02.12.2016 17:59, Tom Herbert wrote:
> > On Fri, Dec 2, 2016 at 3:54 AM, Hannes Frederic Sowa
> > <hannes@stressinduktion.org> wrote:  
> >> On 02.12.2016 11:24, Jesper Dangaard Brouer wrote:  
> >>> On Thu, 1 Dec 2016 13:51:32 -0800
> >>> Tom Herbert <tom@herbertland.com> wrote:
> >>>  
> >>>>>> The technical plenary at last IETF on Seoul a couple of weeks ago was
> >>>>>> exclusively focussed on DDOS in light of the recent attack against
> >>>>>> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
> >>>>>> presentation by Nick Sullivan
> >>>>>> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
> >>>>>> alluded to some implementation of DDOS mitigation. In particular, on
> >>>>>> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"  
> >>>
> >>> slide 14
> >>>  
> >>>>>> numbers he gave we're based in iptables+BPF and that was a whole
> >>>>>> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
> >>>>>> and that's also when I introduced XDP to whole IETF :-) ). If that's
> >>>>>> the best we can do the Internet is in a world hurt. DDOS mitigation
> >>>>>> alone is probably a sufficient motivation to look at XDP. We need
> >>>>>> something that drops bad packets as quickly as possible when under
> >>>>>> attack, we need this to be integrated into the stack, we need it to be
> >>>>>> programmable to deal with the increasing savvy of attackers, and we
> >>>>>> don't want to be forced to be dependent on HW solutions. This is why
> >>>>>> we created XDP!  
> >>>
> >>> The 1.2Mpps number is a bit low, but we are unfortunately in that
> >>> ballpark.
> >>>  
> >>>>> I totally understand that. But in my reply to David in this thread I
> >>>>> mentioned DNS apex processing as being problematic which is actually
> >>>>> being referred in your linked slide deck on page 9 ("What do floods look
> >>>>> like") and the problematic of parsing DNS packets in XDP due to string
> >>>>> processing and looping inside eBPF.  
> >>>
> >>> That is a weak argument. You do realize CloudFlare actually use eBPF to
> >>> do this exact filtering, and (so-far) eBPF for parsing DNS have been
> >>> sufficient for them.  
> >>
> >> You are talking about this code on the following slides (I actually
> >> transcribed it for you here and disassembled):
> >>
> >> l0:     ld #0x14
> >> l1:     ldxb 4*([0]&0xf)
> >> l2:     add x
> >> l3:     tax
> >> l4:     ld [x+0]
> >> l5:     jeq #0x7657861, l6, l13
> >> l6:     ld [x+4]
> >> l7:     jeq #0x6d706c65, l8, l13
> >> l8:     ld [x+8]
> >> l9:     jeq #0x3636f6d, l10, l13
> >> l10:    ldb [x+12]
> >> l11:    jeq #0, l12, l13
> >> l12:    ret #0x1
> >> l13:    ret #0
> >>
> >> You can offload this to u32 in hardware if that is what you want.
> >>
> >> The reason this works is because of netfilter, which allows them to
> >> dynamically generate BPF programs and insert and delete them from
> >> chains, do intersection or unions of them.
> >>
> >> If you have a freestanding program like in XDP the complexity space is a
> >> different one and not comparable to this at all.
> >>  
> > I don't understand this comment about complexity especially in regards
> > to the idea of offloading u32 to hardware. Relying on hardware to do
> > anything always leads to more complexity than an equivalent SW
> > implementation for the same functionality. The only reason we ever use
> > a hardware mechanisms is if it gives *significantly* better
> > performance. If the performance difference isn't there then doing
> > things in SW is going to be the better path (as we see in XDP).  
> 
> I am just wondering why the u32 filter wasn't mentioned in their slide
> deck. If all what Cloudflare needs are those kind of matches, they are
> in fact actually easier to generate than an cBPF program. It is not a
> good example of how a real world DoS filter in XDP would look like.
> 
> If you argue XDP as a C function hook that can call arbitrary code in
> the driver before submitting that to the networking stack, yep, that is
> not complex at all. Depending on how those modules will be maintained,
> they either end up in the kernel and will be updated on major changes or
> are 3rd party and people have to update them and also depend on the
> driver features.
> 
> But this opens up a whole new can of worms also. I haven't really
> thought this through completely, but last time the patches were nack'ed
> with lots of strong opinions and I tended to agree with them. I am
> revisiting this position.
> 
> Certainly you can build real-world DoS protection with this function
> pointer hook and C code in the driver. In this case a user space
> solution still has advantages because of maintainability, as e.g. with
> netmap or dpdk you are again decoupled from the in-kernel API/ABI and
> don't need to test, recompile etc. on each kernel upgrade. If the module
> ends up in the kernel, those problems might also disappear.
> 
> For XDP+eBPF to provide a full DoS mitigation (protocol parsing,
> sampling and dropping) solution seems to be too complex for me because
> of the arguments I stated in my previous mail.

I take a "horses for courses" attitude.
 - XDP is better for providing high speed packet mangling. It is more
   programmable and faster than existing TC, iptables, nftables, infrastructure.

 - DPDK is better for implementing a networking infrastructure application.

To give two examples.  Implementing something as complex as FD.io/VPP
with XDP would be massive undertaking and not worth the effort. Likewise
reimplementing the full Linux networking stack with all the work on
congestion control, queue management and socket API's in DPDK would
be waste of effort. That is not to say that someone won't try it,
but it will create more bloat and bugs.

Unfortunately, both camps seem to have a high NIMBY quotient and
things are being developed for their own self interest. This is ok
as long as the competition yields better software, but I am little
concerned that is just going to cause more complexity with no gain.

Also, the end users are confused. I have heard from people involved
in NFV that want to use XDP. And users of server applications that
want to use DPDK.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-02 19:56                       ` Stephen Hemminger
@ 2016-12-02 20:19                         ` Tom Herbert
  0 siblings, 0 replies; 40+ messages in thread
From: Tom Herbert @ 2016-12-02 20:19 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Hannes Frederic Sowa, Jesper Dangaard Brouer, Thomas Graf,
	Florian Westphal, Linux Kernel Network Developers

On Fri, Dec 2, 2016 at 11:56 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Fri, 2 Dec 2016 19:12:00 +0100
> Hannes Frederic Sowa <hannes@stressinduktion.org> wrote:
>
>> On 02.12.2016 17:59, Tom Herbert wrote:
>> > On Fri, Dec 2, 2016 at 3:54 AM, Hannes Frederic Sowa
>> > <hannes@stressinduktion.org> wrote:
>> >> On 02.12.2016 11:24, Jesper Dangaard Brouer wrote:
>> >>> On Thu, 1 Dec 2016 13:51:32 -0800
>> >>> Tom Herbert <tom@herbertland.com> wrote:
>> >>>
>> >>>>>> The technical plenary at last IETF on Seoul a couple of weeks ago was
>> >>>>>> exclusively focussed on DDOS in light of the recent attack against
>> >>>>>> Dyn. There were speakers form Cloudflare and Dyn. The Cloudflare
>> >>>>>> presentation by Nick Sullivan
>> >>>>>> (https://www.ietf.org/proceedings/97/slides/slides-97-ietf-sessb-how-to-stay-online-harsh-realities-of-operating-in-a-hostile-network-nick-sullivan-01.pdf)
>> >>>>>> alluded to some implementation of DDOS mitigation. In particular, on
>> >>>>>> slide 6 Nick gave some numbers for drop rates in DDOS. The "kernel"
>> >>>
>> >>> slide 14
>> >>>
>> >>>>>> numbers he gave we're based in iptables+BPF and that was a whole
>> >>>>>> 1.2Mpps-- somehow that seems ridiculously to me (I said so at the mic
>> >>>>>> and that's also when I introduced XDP to whole IETF :-) ). If that's
>> >>>>>> the best we can do the Internet is in a world hurt. DDOS mitigation
>> >>>>>> alone is probably a sufficient motivation to look at XDP. We need
>> >>>>>> something that drops bad packets as quickly as possible when under
>> >>>>>> attack, we need this to be integrated into the stack, we need it to be
>> >>>>>> programmable to deal with the increasing savvy of attackers, and we
>> >>>>>> don't want to be forced to be dependent on HW solutions. This is why
>> >>>>>> we created XDP!
>> >>>
>> >>> The 1.2Mpps number is a bit low, but we are unfortunately in that
>> >>> ballpark.
>> >>>
>> >>>>> I totally understand that. But in my reply to David in this thread I
>> >>>>> mentioned DNS apex processing as being problematic which is actually
>> >>>>> being referred in your linked slide deck on page 9 ("What do floods look
>> >>>>> like") and the problematic of parsing DNS packets in XDP due to string
>> >>>>> processing and looping inside eBPF.
>> >>>
>> >>> That is a weak argument. You do realize CloudFlare actually use eBPF to
>> >>> do this exact filtering, and (so-far) eBPF for parsing DNS have been
>> >>> sufficient for them.
>> >>
>> >> You are talking about this code on the following slides (I actually
>> >> transcribed it for you here and disassembled):
>> >>
>> >> l0:     ld #0x14
>> >> l1:     ldxb 4*([0]&0xf)
>> >> l2:     add x
>> >> l3:     tax
>> >> l4:     ld [x+0]
>> >> l5:     jeq #0x7657861, l6, l13
>> >> l6:     ld [x+4]
>> >> l7:     jeq #0x6d706c65, l8, l13
>> >> l8:     ld [x+8]
>> >> l9:     jeq #0x3636f6d, l10, l13
>> >> l10:    ldb [x+12]
>> >> l11:    jeq #0, l12, l13
>> >> l12:    ret #0x1
>> >> l13:    ret #0
>> >>
>> >> You can offload this to u32 in hardware if that is what you want.
>> >>
>> >> The reason this works is because of netfilter, which allows them to
>> >> dynamically generate BPF programs and insert and delete them from
>> >> chains, do intersection or unions of them.
>> >>
>> >> If you have a freestanding program like in XDP the complexity space is a
>> >> different one and not comparable to this at all.
>> >>
>> > I don't understand this comment about complexity especially in regards
>> > to the idea of offloading u32 to hardware. Relying on hardware to do
>> > anything always leads to more complexity than an equivalent SW
>> > implementation for the same functionality. The only reason we ever use
>> > a hardware mechanisms is if it gives *significantly* better
>> > performance. If the performance difference isn't there then doing
>> > things in SW is going to be the better path (as we see in XDP).
>>
>> I am just wondering why the u32 filter wasn't mentioned in their slide
>> deck. If all what Cloudflare needs are those kind of matches, they are
>> in fact actually easier to generate than an cBPF program. It is not a
>> good example of how a real world DoS filter in XDP would look like.
>>
>> If you argue XDP as a C function hook that can call arbitrary code in
>> the driver before submitting that to the networking stack, yep, that is
>> not complex at all. Depending on how those modules will be maintained,
>> they either end up in the kernel and will be updated on major changes or
>> are 3rd party and people have to update them and also depend on the
>> driver features.
>>
>> But this opens up a whole new can of worms also. I haven't really
>> thought this through completely, but last time the patches were nack'ed
>> with lots of strong opinions and I tended to agree with them. I am
>> revisiting this position.
>>
>> Certainly you can build real-world DoS protection with this function
>> pointer hook and C code in the driver. In this case a user space
>> solution still has advantages because of maintainability, as e.g. with
>> netmap or dpdk you are again decoupled from the in-kernel API/ABI and
>> don't need to test, recompile etc. on each kernel upgrade. If the module
>> ends up in the kernel, those problems might also disappear.
>>
>> For XDP+eBPF to provide a full DoS mitigation (protocol parsing,
>> sampling and dropping) solution seems to be too complex for me because
>> of the arguments I stated in my previous mail.
>
> I take a "horses for courses" attitude.
>  - XDP is better for providing high speed packet mangling. It is more
>    programmable and faster than existing TC, iptables, nftables, infrastructure.
>
>  - DPDK is better for implementing a networking infrastructure application.
>
> To give two examples.  Implementing something as complex as FD.io/VPP
> with XDP would be massive undertaking and not worth the effort. Likewise
> reimplementing the full Linux networking stack with all the work on
> congestion control, queue management and socket API's in DPDK would
> be waste of effort. That is not to say that someone won't try it,
> but it will create more bloat and bugs.
>
> Unfortunately, both camps seem to have a high NIMBY quotient and
> things are being developed for their own self interest. This is ok
> as long as the competition yields better software, but I am little
> concerned that is just going to cause more complexity with no gain.
>
> Also, the end users are confused. I have heard from people involved
> in NFV that want to use XDP. And users of server applications that
> want to use DPDK.
>
As davem said at netdev: "DPDK is not Linux". I don't see that it's
our problem to try to figure out how to make DPDK complementary
somehow or work together in harmony to resolve end user confusion. If
users want to use DPDK that's their prerogative,  but other that
providing a good reference for performance I don't see how this
impacts Linux nor the direction we need to take things. We've already
seen this same story play out with RDMA over the years...

Tom

>
>
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-02 19:42                 ` Hannes Frederic Sowa
@ 2016-12-02 23:34                   ` Alexei Starovoitov
  2016-12-04 16:05                     ` [flamebait] xdp Was: " Hannes Frederic Sowa
  0 siblings, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2016-12-02 23:34 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On Fri, Dec 02, 2016 at 08:42:41PM +0100, Hannes Frederic Sowa wrote:
> On Fri, Dec 2, 2016, at 20:25, Hannes Frederic Sowa wrote:
> > On 02.12.2016 19:39, Alexei Starovoitov wrote:
> > > On Thu, Dec 01, 2016 at 10:27:12PM +0100, Hannes Frederic Sowa wrote:
> > >> like") and the problematic of parsing DNS packets in XDP due to string
> > >> processing and looping inside eBPF.
> > > 
> > > Hannes,
> > > Not too long ago you proposed a very interesting idea to add
> > > support for bounded loops without adding any new bpf instructions and
> > > changing llvm (which was way better than my 'rep' like instructions
> > > I was experimenting with). I thought systemtap guys also wanted bounded
> > > loops and you were cooperating on the design, so I gave up on my work and
> > > was expecting an imminent patch from you. I guess it sounds like you know
> > > believe that bounded loops are impossible or I misunderstand your statement ?
> > 
> > Your argument was that it would need a new verifier as the current first
> > pass checks that we indeed can lay out the basic blocks as a DAG which
> > the second pass depends on. This would be violated.

yes. today the main part of verifier depends on cfg check that confirms DAG
property of the program. This was done as a simplification for the algorithm,
so any programmer that understands C can understand the verifier code.
It certainly was the case, since most of the people who hacked
verifier had zero compiler background.
Now I'm thinking to introduce proper compiler technologies to it.
On one side it will make the bar to understand higher and on the other
side it will cleanup the logic and reuse tens of years of data flow
analysis theory and will make verifier more robust and mathematically
solid.

> > Because eBPF is available by non privileged users this would need a lot
> > of effort to rewrite and verify (or indeed keep two verifiers in the
> > kernel for priv and non-priv). The verifier itself is exposed to
> > unprivileged users.

I certainly hear your concerns that people unfamiliar with it are simply
scared that more and more verification logic being added. So I don't mind
freezing current verifier for unpriv and let proper data flow analysis
to be done in root only component.

> > Also, by design, if we keep the current limits, this would not give you
> > more instructions to operate on compared to the flattened version of the
> > program, it would merely reduce the numbers of optimizations in LLVM
> > that let the verifier reject the program.

I think we most likely will keep 4k insn limit (since there were no
requests to increase it). The bounded loops will improve performance
and reduce I-cache misses.

> The only solution to protect the verifier, which I saw, would be to
> limit it by time and space, thus making loading of eBPF programs
> depending on how fast and hot (thermal throttling) one CPU thread is.

the verifier already has time and space limits.
See no reason to rely on physical cpu sensors.

> Those are the complexity problems I am talking and concerned about.

Do you have concerns when people implement encryption algorithm
that you're unfamiliar with?
Isn't it much bigger concern, since any bugs in the algorithm
are directly exploitable and when encryption is actually used
it's protecting sensitive data, whereas here the verifier
protects kernel from crashing.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-02 19:42                 ` John Fastabend
  2016-12-02 19:50                   ` Hannes Frederic Sowa
@ 2016-12-03  0:20                   ` Alexei Starovoitov
  2016-12-03  9:11                     ` Sargun Dhillon
  1 sibling, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2016-12-03  0:20 UTC (permalink / raw)
  To: John Fastabend
  Cc: Hannes Frederic Sowa, Tom Herbert, Thomas Graf,
	Linux Kernel Network Developers, Daniel Borkmann,
	David S. Miller

On Fri, Dec 02, 2016 at 11:42:15AM -0800, John Fastabend wrote:
> >> As far as pattern search for DNS packets...
> >> it was requested by Cloudflare guys back in March:
> >> https://github.com/iovisor/bcc/issues/471
> >> and it is useful for several tracing use cases as well.
> >> Unfortunately no one had time to implement it yet.
> > 
> > The string operations you proposed on the other hand, which would count
> > as one eBPF instructions, would give a lot more flexibility and allow
> > more cycles to burn, but don't help parsing binary protocols like IPv6
> > extension headers.

these are two separate things. we need pattern search regardless
of bounded loops. bpf program shouldn't be doing any complicated
algorithms. The main reasons to have loops are:
- speed up execution (smaller I-cache footprint)
- avoid forcing compiler to unroll loops (easier for users)
- support loops where unroll is not possible (like example below)

> My rough thinking on this was the verifier had to start looking for loop
> invariants and to guarantee termination. Sounds scary in general but
> LLVM could put these in some normal form for us and the verifier could
> only accept decreasing loops, the invariants could be required to be
> integers, etc. By simplifying the loop enough the problem becomes
> tractable.

yep. I think what Hannes was proposing earlier is straighforward
to implement for a compiler guy. The following:
for (int i = 0; i < (var & 0xff); i++)
  sum += map->value[i];  /* map value_size >= 0xff */
is obviously bounded and dataflow analysis can easily prove
that all memory operations are valid.
Static analysis tools do way way more than this.

> I think this would be better than new instructions and/or multiple
> verifiers.

agree that it's better than new instructions that would have
required JIT changes. Though there are pros to new insns too :)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-03  0:20                   ` Alexei Starovoitov
@ 2016-12-03  9:11                     ` Sargun Dhillon
  0 siblings, 0 replies; 40+ messages in thread
From: Sargun Dhillon @ 2016-12-03  9:11 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: John Fastabend, Hannes Frederic Sowa, Tom Herbert, Thomas Graf,
	Linux Kernel Network Developers, Daniel Borkmann,
	David S. Miller

On Fri, Dec 2, 2016 at 4:20 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Fri, Dec 02, 2016 at 11:42:15AM -0800, John Fastabend wrote:
>> >> As far as pattern search for DNS packets...
>> >> it was requested by Cloudflare guys back in March:
>> >> https://github.com/iovisor/bcc/issues/471
>> >> and it is useful for several tracing use cases as well.
>> >> Unfortunately no one had time to implement it yet.
>> >
>> > The string operations you proposed on the other hand, which would count
>> > as one eBPF instructions, would give a lot more flexibility and allow
>> > more cycles to burn, but don't help parsing binary protocols like IPv6
>> > extension headers.
>
> these are two separate things. we need pattern search regardless
> of bounded loops. bpf program shouldn't be doing any complicated
> algorithms. The main reasons to have loops are:
> - speed up execution (smaller I-cache footprint)
> - avoid forcing compiler to unroll loops (easier for users)
> - support loops where unroll is not possible (like example below)
>
>> My rough thinking on this was the verifier had to start looking for loop
>> invariants and to guarantee termination. Sounds scary in general but
>> LLVM could put these in some normal form for us and the verifier could
>> only accept decreasing loops, the invariants could be required to be
>> integers, etc. By simplifying the loop enough the problem becomes
>> tractable.
>
> yep. I think what Hannes was proposing earlier is straighforward
> to implement for a compiler guy. The following:
> for (int i = 0; i < (var & 0xff); i++)
>   sum += map->value[i];  /* map value_size >= 0xff */
> is obviously bounded and dataflow analysis can easily prove
> that all memory operations are valid.
> Static analysis tools do way way more than this.
>
>> I think this would be better than new instructions and/or multiple
>> verifiers.
>
> agree that it's better than new instructions that would have
> required JIT changes. Though there are pros to new insns too :)
>
Has there been any thought to adding a map, or foldl helper a la the
tail call helper? Although you'd want to allocate an accumulator of
kinds for the foldl, I imagine this could be bounded in size quite
small for things like binary parsing operations -- we could reasonably
allow the accumulator to be updated, and return a special value to
exit the loop. I also started working on a map function a while ago
which would call a bpf program for each set cell in an arraymap, and
each set key/value in a hash map.

My intent was to intentionally make it so I could do this on the
context itself, so I could do encryption in BPF. I wanted to be able
to fold over the packet 16, or 32 bytes at a time, and (1) modify the
content, and (2) generate the authentication tag.

Any opinions on that approach?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-02 17:22 ` Jesper Dangaard Brouer
@ 2016-12-03 16:19   ` Willem de Bruijn
  2016-12-03 19:48     ` John Fastabend
  0 siblings, 1 reply; 40+ messages in thread
From: Willem de Bruijn @ 2016-12-03 16:19 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Florian Westphal, Network Development

On Fri, Dec 2, 2016 at 12:22 PM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Thu, 1 Dec 2016 10:11:08 +0100 Florian Westphal <fw@strlen.de> wrote:
>
>> In light of DPDKs existence it make a lot more sense to me to provide
>> a). a faster mmap based interface (possibly AF_PACKET based) that allows
>> to map nic directly into userspace, detaching tx/rx queue from kernel.
>>
>> John Fastabend sent something like this last year as a proof of
>> concept, iirc it was rejected because register space got exposed directly
>> to userspace.  I think we should re-consider merging netmap
>> (or something conceptually close to its design).
>
> I'm actually working in this direction, of zero-copy RX mapping packets
> into userspace.  This work is mostly related to page_pool, and I only
> plan to use XDP as a filter for selecting packets going to userspace,
> as this choice need to be taken very early.
>
> My design is here:
>  https://prototype-kernel.readthedocs.io/en/latest/vm/page_pool/design/memory_model_nic.html
>
> This is mostly about changing the memory model in the drivers, to allow
> for safely mapping pages to userspace.  (An efficient queue mechanism is
> not covered).

Virtio virtqueues are used in various other locations in the stack.
With separate memory pools and send + completion descriptor rings,
signal moderation, careful avoidance of cacheline bouncing, etc. these
seem like a good opportunity for a TPACKET_V4 format.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-03 16:19   ` Willem de Bruijn
@ 2016-12-03 19:48     ` John Fastabend
  2016-12-05 11:04       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 40+ messages in thread
From: John Fastabend @ 2016-12-03 19:48 UTC (permalink / raw)
  To: Willem de Bruijn, Jesper Dangaard Brouer
  Cc: Florian Westphal, Network Development

On 16-12-03 08:19 AM, Willem de Bruijn wrote:
> On Fri, Dec 2, 2016 at 12:22 PM, Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
>>
>> On Thu, 1 Dec 2016 10:11:08 +0100 Florian Westphal <fw@strlen.de> wrote:
>>
>>> In light of DPDKs existence it make a lot more sense to me to provide
>>> a). a faster mmap based interface (possibly AF_PACKET based) that allows
>>> to map nic directly into userspace, detaching tx/rx queue from kernel.
>>>
>>> John Fastabend sent something like this last year as a proof of
>>> concept, iirc it was rejected because register space got exposed directly
>>> to userspace.  I think we should re-consider merging netmap
>>> (or something conceptually close to its design).
>>
>> I'm actually working in this direction, of zero-copy RX mapping packets
>> into userspace.  This work is mostly related to page_pool, and I only
>> plan to use XDP as a filter for selecting packets going to userspace,
>> as this choice need to be taken very early.
>>
>> My design is here:
>>  https://prototype-kernel.readthedocs.io/en/latest/vm/page_pool/design/memory_model_nic.html
>>
>> This is mostly about changing the memory model in the drivers, to allow
>> for safely mapping pages to userspace.  (An efficient queue mechanism is
>> not covered).
> 
> Virtio virtqueues are used in various other locations in the stack.
> With separate memory pools and send + completion descriptor rings,
> signal moderation, careful avoidance of cacheline bouncing, etc. these
> seem like a good opportunity for a TPACKET_V4 format.
> 

FWIW. After we rejected exposing the register space to user space due to
valid security issues we fell back to using VFIO which works nicely for
mapping virtual functions into userspace and VMs. The main  drawback is
user space has to manage the VF but that is mostly a solved problem at
this point. Deployment concerns aside.

There was a TPACKET_V4 version we had a prototype of that passed
buffers down to the hardware to use with the dma engine. This gives
zero-copy but same as VFs requires the hardware to do all the steering
of traffic and any expected policy in front of the application. Due to
requiring user space to kick hardware and vice versa though it was
somewhat slower so I didn't finish it up. The kick was implemented as a
syscall iirc. I can maybe look at it a bit more next week and see if its
worth reviving now in this context.

I don't think any of this requires page pools though. Or rather tpacket
and vhost/virtio already know how to do page pools is perhaps the other
way to look at it.

One idea I've been playing around with is a vhost backend using
tpacketv{3|4} so we don't require socket manipulation.

Thanks,
John

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp Was: Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-02 23:34                   ` Alexei Starovoitov
@ 2016-12-04 16:05                     ` Hannes Frederic Sowa
  2016-12-06  3:05                       ` Alexei Starovoitov
  0 siblings, 1 reply; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-04 16:05 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

Hello,

On 03.12.2016 00:34, Alexei Starovoitov wrote:
> On Fri, Dec 02, 2016 at 08:42:41PM +0100, Hannes Frederic Sowa wrote:
>> On Fri, Dec 2, 2016, at 20:25, Hannes Frederic Sowa wrote:
>>> On 02.12.2016 19:39, Alexei Starovoitov wrote:
>>>> On Thu, Dec 01, 2016 at 10:27:12PM +0100, Hannes Frederic Sowa wrote:
>>>>> like") and the problematic of parsing DNS packets in XDP due to string
>>>>> processing and looping inside eBPF.
>>>>
>>>> Hannes,
>>>> Not too long ago you proposed a very interesting idea to add
>>>> support for bounded loops without adding any new bpf instructions and
>>>> changing llvm (which was way better than my 'rep' like instructions
>>>> I was experimenting with). I thought systemtap guys also wanted bounded
>>>> loops and you were cooperating on the design, so I gave up on my work and
>>>> was expecting an imminent patch from you. I guess it sounds like you know
>>>> believe that bounded loops are impossible or I misunderstand your statement ?
>>>
>>> Your argument was that it would need a new verifier as the current first
>>> pass checks that we indeed can lay out the basic blocks as a DAG which
>>> the second pass depends on. This would be violated.
> 
> yes. today the main part of verifier depends on cfg check that confirms DAG
> property of the program. This was done as a simplification for the algorithm,
> so any programmer that understands C can understand the verifier code.
> It certainly was the case, since most of the people who hacked
> verifier had zero compiler background.
> Now I'm thinking to introduce proper compiler technologies to it.
> On one side it will make the bar to understand higher and on the other
> side it will cleanup the logic and reuse tens of years of data flow
> analysis theory and will make verifier more robust and mathematically
> solid.

See below.

>>> Because eBPF is available by non privileged users this would need a lot
>>> of effort to rewrite and verify (or indeed keep two verifiers in the
>>> kernel for priv and non-priv). The verifier itself is exposed to
>>> unprivileged users.
> 
> I certainly hear your concerns that people unfamiliar with it are simply
> scared that more and more verification logic being added. So I don't mind
> freezing current verifier for unpriv and let proper data flow analysis
> to be done in root only component.
> 
>>> Also, by design, if we keep the current limits, this would not give you
>>> more instructions to operate on compared to the flattened version of the
>>> program, it would merely reduce the numbers of optimizations in LLVM
>>> that let the verifier reject the program.
> 
> I think we most likely will keep 4k insn limit (since there were no
> requests to increase it). The bounded loops will improve performance
> and reduce I-cache misses.

I agree that bounded loops will increase performance and in general I
see lifting this limitation as something good if it works out.

>> The only solution to protect the verifier, which I saw, would be to
>> limit it by time and space, thus making loading of eBPF programs
>> depending on how fast and hot (thermal throttling) one CPU thread is.
> 
> the verifier already has time and space limits.
> See no reason to rely on physical cpu sensors.

Time and space is bounded by the DAG property. It is still bounded in
the directed cyclic case (some arbitrary upper limit), but can have a
combinatorical explosion because of the switch from proving properties
for each node+state to prove properties for each path+state.

Compiler algorithms maybe a help here, but historically have focused on
other properties, mainly optimization and thus are mostly heuristics.

Compiler developers don't write the algorithm under the assumption they
will execute in a security and resource sensitive environment (only the
generated code should be safe). I believe that optimization algorithms
increase the attack surface, as their big-O worst case might be an
additional cost, additionally to the worst case verification path. I
don't think compiler engineers think about the code to optimize
attacking the optimization algorithm itself.

Verification of a malicious BPF program (complexity bomb) in the kernel
should not disrupt nor create a security threat (we are also mostly
voluntary preemptible in this code and hold a lock). Verification might
fail when memory fragmentation becomes more probably, as the huge state
table for path sensitive verification cannot be allocated.

In user space, instead of verification of many properties regarding
program state (which you actually need in the BPF verifier), the
development effort concentrates on sanitizers. Otherwise look how good
gcc is in finding uninitialized variables.

Most mathematical proofs that are written for compiler optimization I
know of are written to show equivalence between program text and the
optimization. Compile time recently became a more important aspect of
the open source compilers at least.

I am happy to be shown wrong here and assume there is no better
algorithm, which is reasonable easy for the kernel - I am a pessimist
but am happy to look at proposals.

>> Those are the complexity problems I am talking and concerned about.
> 
> Do you have concerns when people implement encryption algorithm
> that you're unfamiliar with?

Absolutely not, this can already be done in user space very much the
same, I can remove it, delete it, uninstall it. APIs in the kernel stay
forever.

> Isn't it much bigger concern, since any bugs in the algorithm
> are directly exploitable and when encryption is actually used
> it's protecting sensitive data, whereas here the verifier
> protects kernel from crashing.

Bugs happen everywhere, but a bug in the kernel itself creates a whole
bigger mess than a bug in a security constrained user space application
with proper privilege drops (at least it should).

The complexity arguments in random order:

1) The code itself - if the route is taken to provide two verifiers for
eBPF, this certainly comes with a maintenance cost to keep them secure
(secure as in the verifier itself should not expose a threat neither is
the to be verified program allowed to leak data or exceed its address
space nor some time budget).

If one of those eBPF verifiers only accepts a certain number of INSN, as
fundamental as backwards jumps, we might end up with two compiler?

At some point Google might start to fuzz them like right now other
fundamental parts of the kernel and someone must review all the code and
fix them in both of them.

If we shift verification to new heuristics programs will still fail,
just under new conditions, this exposes complexity to the users.
Libraries cannot even be exchanged between those twos.

2) Integration and API considerations:

E.g. the newly added cgroup socket bpf interface, which can modify
bound_dev_if. If you restart networking (old way
/etc/init.d/networking), those if_indexes are all outdated and the
system will behave strangely. Network management software needs to walk
cgroup (a different kernel subsystem) now, too, in order to refresh
outdated information, recompile and insert. Even worse, all user space
programs which might be in the cgroups need to be restarted, too, as the
change is permanent per socket lifetime. Instead of reconfiguring
networking, I will probably now restart the whole box.
Furthermore it gets even more complicated because cgroup can be
namespace'd nowadays and doesn't necessarily need to have no 1:1
relation with the network namespace, so this gets really complicated.

Misconfiguration causes security problems.

In the past we had dependency trees and notifiers to e.g. clean up IPs,
multicast, disable netfilter rules, routes, etc., because the current
kernel tries to keep a specific semantic when you disable an interface
and bring it back up or do some other configuration change.

Like in the eBPF example with cgroups above, networking stack state
might become hard coded in XDP programs because of simplicity and
regularly needs to be refreshed to correspond linux networking changes
from user space, too. User space program crashes or gets killed by an
admin and doesn't remember to clean-up, the kernel is partly tainted
with some state you can search in different subsystems, cgroups, XDP,
qdiscs and clean up yourself (if we assume we can't do everything in XDP).

All those changes pretty much speak for a new user space control plane,
which wasn't that much necessary so far. Two places to keep your data up
to date creates intermediate states where it is not up to date and also
the update process itself might be complex (e.g. simple things like
unicast/multicast address filter changes on the NIC vs. what the XDP
program thinks). Ergo, more complexity. What do you do when one of those
two systems fail? What is the reference data? What do you do if on a
highly busy box during DoS constant reloading of your vmalloc happens (I
don't know if it is a problem under DoS)?

Lot's of effort went into transparent hw offloading to still retain this
property of transparency and behave as a proper citizen.

If XDP should not be considered an outsider of the networking stack, it
should as well understand reconfiguration events which leads to leakage
of kernel internals into XDP (e.g. Jesper's example of writing a bridge
in XDP, or offloading examples where we need some network config pieces).

I find it strange that hw offloading people try as much as possible to
discover the state of the linux networking stack inside the kernel and
apply this state transparently to hw offloads. Will there be shulti
switch available to allow user space to sniff switchdev events to
compile them to XDP or update hashtables?

"Software offloading" basically brings the control and dataplane into
user space (not technically but from the users PoV), making it
abnormally difficult to fit in with the traditional network admin tools.

3) Security complexity:

I was e.g. wondering if there is an architectural TOCTTOU in
"env->allow_ptr_leaks = capable(CAP_SYS_ADMIN);" in the verifier,
because bpf objects can be easily passed around and be attached to file
descriptors etc. of processes that might be sandboxed.

Leaks of pointers can e.g. lead to discover the state of address space
layout randomization. Can we now pass verifier-ng verified eBPF programs
around to untrusted programs? To sandboxes? Need they be tainted? Do we
need to consider that for bpffs? What about user namespaces? If it turns
out to be a problem now, would a change like this be backwards compatible?

Helpers for eBPF need to be checked regularly if they still provide
their safety guarantees, especially if they call into other subsystems,
also when new features get added.

4) Complexity for users of this framework (compiler problems solved):

I tried to argue that someone wanting to build netmap/DPDK-alike things
in XDP, one faces the problem of synchronized IPC. Hashmaps solve this
to some degree but cannot be synchronized. Recent offloading series
showed e.g. how hard it is to keep state up-to-date in multiple places.
If XDP-user space sharing happens this might be solved.

Adding external functions to eBPF might also freeze some kernel
internals or the eBPF wrappers might become complex down some years.

DPDK even can configure various hw offloads already before the kernel
can do so. If users want to use those, they switch to DPDK also, as I
have seen the industry always wanting the best performance. DPDK can use
SIMD instructions, all AVX, SSE and MMX stuff, and they do it.

Users still depend on specific versions of the kernel due to which
functions are exported to XDP.

Debugging is harder but currently worked on. But will probably always be
harder than simply using a debugger.

This all leads to gigantic user space control planes like neutron and
others that just make everyone's life much harder. The model requires
this. And that is what I fear.

I am not at all that negative against a hook before allocating the
packet, but making everyone using it and marketing as an alternative to
DPDK doesn't seem to fit for me.

It seems to me if XDP should even remotely be comparable to DPDK a lot
of complexity has to be added. It is not Linux network stack for me
either so far.

Sorry if I hijacked the bounded loop discussion for the rant. I am happy
to discuss or follow-up on ideas regarding lifting the looping limit in
eBPF.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp, well meaning but pointless
  2016-12-03 19:48     ` John Fastabend
@ 2016-12-05 11:04       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 40+ messages in thread
From: Jesper Dangaard Brouer @ 2016-12-05 11:04 UTC (permalink / raw)
  To: John Fastabend
  Cc: Willem de Bruijn, Florian Westphal, Network Development, brouer

On Sat, 3 Dec 2016 11:48:22 -0800
John Fastabend <john.fastabend@gmail.com> wrote:

> On 16-12-03 08:19 AM, Willem de Bruijn wrote:
> > On Fri, Dec 2, 2016 at 12:22 PM, Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:  
> >>
> >> On Thu, 1 Dec 2016 10:11:08 +0100 Florian Westphal <fw@strlen.de> wrote:
> >>  
> >>> In light of DPDKs existence it make a lot more sense to me to provide
> >>> a). a faster mmap based interface (possibly AF_PACKET based) that allows
> >>> to map nic directly into userspace, detaching tx/rx queue from kernel.
> >>>
> >>> John Fastabend sent something like this last year as a proof of
> >>> concept, iirc it was rejected because register space got exposed directly
> >>> to userspace.  I think we should re-consider merging netmap
> >>> (or something conceptually close to its design).  
> >>
> >> I'm actually working in this direction, of zero-copy RX mapping packets
> >> into userspace.  This work is mostly related to page_pool, and I only
> >> plan to use XDP as a filter for selecting packets going to userspace,
> >> as this choice need to be taken very early.
> >>
> >> My design is here:
> >>  https://prototype-kernel.readthedocs.io/en/latest/vm/page_pool/design/memory_model_nic.html
> >>
> >> This is mostly about changing the memory model in the drivers, to allow
> >> for safely mapping pages to userspace.  (An efficient queue mechanism is
> >> not covered).  
> > 
> > Virtio virtqueues are used in various other locations in the stack.
> > With separate memory pools and send + completion descriptor rings,
> > signal moderation, careful avoidance of cacheline bouncing, etc. these
> > seem like a good opportunity for a TPACKET_V4 format.
> >   
> 
> FWIW. After we rejected exposing the register space to user space due to
> valid security issues we fell back to using VFIO which works nicely for
> mapping virtual functions into userspace and VMs. The main  drawback is
> user space has to manage the VF but that is mostly a solved problem at
> this point. Deployment concerns aside.

Using VFs (PCIe SR-IOV Virtual Functions) solves this in a completely
different orthogonal way.  To me it is still like taking over the entire
NIC, although you use HW to split the traffic into VFs.  Setup for VF
deployment still looks troubling like 1G hugepages and vfio
enable_unsafe_noiommu_mode=1.  And generally getting SR-IOV working on
your HW is a task of it's own.

One thing people often seem to miss with SR-IOV VFs is that VM-to-VM
traffic will be limited by PCIe bandwidth and transaction overheads.
Like Stepen Hemminger demonstrated[1] at NetDev 1.2 and Luigi also have
a paper demonstrating this (AFAICR).
[1] http://netdevconf.org/1.2/session.html?stephen-hemminger


A key difference in my design is to, allow the NIC to be shared in a
safe manor.  The NIC functions 100% as a normal Linux controlled NIC.
The catch is that once an application request zero-copy RX, then the
NIC might have to reconfigure it's RX-ring usage.  As the driver MUST
change into what I call the "read-only packet page" mode, which
actually is the default in many drivers today.


> There was a TPACKET_V4 version we had a prototype of that passed
> buffers down to the hardware to use with the dma engine. This gives
> zero-copy but same as VFs requires the hardware to do all the steering
> of traffic and any expected policy in front of the application. Due to
> requiring user space to kick hardware and vice versa though it was
> somewhat slower so I didn't finish it up. The kick was implemented as a
> syscall iirc. I can maybe look at it a bit more next week and see if its
> worth reviving now in this context.

This is still at the design stage.  The target here is that the
page_pool and driver adjustments will provide the basis for building RX
zero-copy solutions in a memory safe manor.

I do see tcpdump/RAW packet access like TPACKET_V4 being one of the
first users of this.  Not the only user, as further down the road, I
also imagine RX zero-copy delivery into sockets (and perhaps combined
with a "raw_demux" step that doesn't alloc the SKB, which Tom hinted in
the other thread for UDP delivery).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-02 19:25               ` Hannes Frederic Sowa
  2016-12-02 19:42                 ` John Fastabend
  2016-12-02 19:42                 ` Hannes Frederic Sowa
@ 2016-12-05 16:40                 ` Edward Cree
  2016-12-05 16:50                   ` Hannes Frederic Sowa
  2 siblings, 1 reply; 40+ messages in thread
From: Edward Cree @ 2016-12-05 16:40 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On 02/12/16 19:25, Hannes Frederic Sowa wrote:
> On 02.12.2016 19:39, Alexei Starovoitov wrote:
>> Hannes,
>> Not too long ago you proposed a very interesting idea to add
>> support for bounded loops without adding any new bpf instructions and
>> changing llvm (which was way better than my 'rep' like instructions
>> I was experimenting with). I thought systemtap guys also wanted bounded
>> loops and you were cooperating on the design, so I gave up on my work and
>> was expecting an imminent patch from you. I guess it sounds like you know
>> believe that bounded loops are impossible or I misunderstand your statement ?
> Your argument was that it would need a new verifier as the current first
> pass checks that we indeed can lay out the basic blocks as a DAG which
> the second pass depends on. This would be violated.
I may be completely mistaken here, but can't the verifier unroll the loop 'for
verification' without it actually being unrolled in the program?
I.e., any "proof that the loop terminates" should translate into "rewrite of
the directed graph to make it a DAG, possibly duplicating a lot of insns", and
you feed the rewritten graph to the verifier, while using the original loopy
version as the actual program to store and later execute.
Then the verifier happily checks things like array indices being valid, without
having to know about the bounded loops.

-Ed

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-05 16:40                 ` Edward Cree
@ 2016-12-05 16:50                   ` Hannes Frederic Sowa
  2016-12-05 16:54                     ` Edward Cree
  0 siblings, 1 reply; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-05 16:50 UTC (permalink / raw)
  To: Edward Cree, Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On 05.12.2016 17:40, Edward Cree wrote:
> On 02/12/16 19:25, Hannes Frederic Sowa wrote:
>> On 02.12.2016 19:39, Alexei Starovoitov wrote:
>>> Hannes,
>>> Not too long ago you proposed a very interesting idea to add
>>> support for bounded loops without adding any new bpf instructions and
>>> changing llvm (which was way better than my 'rep' like instructions
>>> I was experimenting with). I thought systemtap guys also wanted bounded
>>> loops and you were cooperating on the design, so I gave up on my work and
>>> was expecting an imminent patch from you. I guess it sounds like you know
>>> believe that bounded loops are impossible or I misunderstand your statement ?
>> Your argument was that it would need a new verifier as the current first
>> pass checks that we indeed can lay out the basic blocks as a DAG which
>> the second pass depends on. This would be violated.
> I may be completely mistaken here, but can't the verifier unroll the loop 'for
> verification' without it actually being unrolled in the program?
> I.e., any "proof that the loop terminates" should translate into "rewrite of
> the directed graph to make it a DAG, possibly duplicating a lot of insns", and
> you feed the rewritten graph to the verifier, while using the original loopy
> version as the actual program to store and later execute.
> Then the verifier happily checks things like array indices being valid, without
> having to know about the bounded loops.

That is what is already happening. E.g. __builtin_memset is expanded up
to 128 rounds (which is a lot) but at some point llvm doesn't do enoug
unrolling of that.

The BPF target configures that in
http://llvm.org/docs/doxygen/html/BPFISelLowering_8cpp_source.html on
line 166-169.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-05 16:50                   ` Hannes Frederic Sowa
@ 2016-12-05 16:54                     ` Edward Cree
  2016-12-06 11:35                       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 40+ messages in thread
From: Edward Cree @ 2016-12-05 16:54 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On 05/12/16 16:50, Hannes Frederic Sowa wrote:
> On 05.12.2016 17:40, Edward Cree wrote:
>> I may be completely mistaken here, but can't the verifier unroll the loop 'for
>> verification' without it actually being unrolled in the program?
>> I.e., any "proof that the loop terminates" should translate into "rewrite of
>> the directed graph to make it a DAG, possibly duplicating a lot of insns", and
>> you feed the rewritten graph to the verifier, while using the original loopy
>> version as the actual program to store and later execute.
>> Then the verifier happily checks things like array indices being valid, without
>> having to know about the bounded loops.
> That is what is already happening. E.g. __builtin_memset is expanded up
> to 128 rounds (which is a lot) but at some point llvm doesn't do enoug
> unrolling of that.
>
> The BPF target configures that in
> http://llvm.org/docs/doxygen/html/BPFISelLowering_8cpp_source.html on
> line 166-169.
I think you're talking about the _compiler_ unrolling loops before it
submits the program to the kernel.  I'm talking about having the _verifier_
unroll them, so that we can execute the original (non-unrolled) version.
Or am I misunderstanding?

-Ed

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp Was: Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-04 16:05                     ` [flamebait] xdp Was: " Hannes Frederic Sowa
@ 2016-12-06  3:05                       ` Alexei Starovoitov
  2016-12-06  5:08                         ` Tom Herbert
  0 siblings, 1 reply; 40+ messages in thread
From: Alexei Starovoitov @ 2016-12-06  3:05 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On Sun, Dec 04, 2016 at 05:05:28PM +0100, Hannes Frederic Sowa wrote:
>
> If one of those eBPF verifiers only accepts a certain number of INSN, as
> fundamental as backwards jumps, we might end up with two compiler?

two compilers? We already have five. There is gcc bpf backend (unmaintained)
and now lua, python and ply project can generate bpf code without llvm.
The kernel verifier has to become smarter. Right now it understands
only certain instruction patterns which caused all five bpf generators to
do extra work to satisfy the verifier. The solution is to do
data flow analysis using proper compiler techniques.

> program thinks). Ergo, more complexity. What do you do when one of those
> two systems fail? What is the reference data? What do you do if on a
> highly busy box during DoS constant reloading of your vmalloc happens (I
> don't know if it is a problem under DoS)?

ddos is one of the key use cases for xdp. If the system is about to oom
during ddos, it has to be fixed. The faster we move with xdp development
the sooner we will find and fix those issues.
And xdp being a core component of the linux kernel we will fix ddos
for the whole internet. Anyone going dpdk route are simply in
business of selling ddos protection with proprietary solutions.

> I tried to argue that someone wanting to build netmap/DPDK-alike things
> in XDP, one faces the problem of synchronized IPC. Hashmaps solve this
> to some degree but cannot be synchronized.

I don't see ipc as a problem and, yes, xdp is the best platform so far
to deliver packets to user space. I think that the dataplane-in-the-driver
is going to be faster than the fastest streaming to user space approach,
but we cannot rule one way or the other without trying multiple
approaches first and benchmarking them against each other.
So I very much in favor of Jesper's effort to deliver packets to user space.

> DPDK even can configure various hw offloads already before the kernel
> can do so.

that's a harsh lesson that the kernel needs to learn. Since people went
to dpdk to do hw offload it means it's our fault that we were not
accommodative and flexible enough to provide such frameworks within
the kernel. imo John's flow/match api should have been accepted
and it would have been solid building block towards such offloads.

> If users want to use those, they switch to DPDK also, as I
> have seen the industry always wanting the best performance. DPDK can use
> SIMD instructions, all AVX, SSE and MMX stuff, and they do it.

agree as well. The kernel needs to find a way to use all of these
fancy instructions where performance matters.
People who say "kernel cannot do simd" just didn't try hard enough.

> Debugging is harder but currently worked on. But will probably always be
> harder than simply using a debugger.

That's actually the important value proposition of xdp+bpf, since
non-working bpf program is not a concern for the kernel support team.
Unlike kernel modules that the kernel team needs to bless and support
in production, bpf programs are outside of that scope. They are part
of user space apps and part of user space responsibility.

> This all leads to gigantic user space control planes like neutron and
> others that just make everyone's life much harder. The model requires
> this. And that is what I fear.

the neutron is complex and fragile, since it's using bridges on
top of bridges with ebtables and ovs in the mix. Trying to manage
many different kernel technologies and a mix of smaller control planes
by this mega control plane is not an easy task.

> I am not at all that negative against a hook before allocating the
> packet, but making everyone using it and marketing as an alternative to
> DPDK doesn't seem to fit for me.

I don't see developers that are forced to use xdp. I see developers
that are eager to use xdp as soon as support for it is available
in their nics. Those like maglev who developed their own bypass
are not going to use dpdk and people who already using dpdk are
not going to switch to xdp, but there are lots of others who
welcome xdp with open arms.

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp Was: Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-06  3:05                       ` Alexei Starovoitov
@ 2016-12-06  5:08                         ` Tom Herbert
  2016-12-06  6:04                           ` Alexei Starovoitov
  0 siblings, 1 reply; 40+ messages in thread
From: Tom Herbert @ 2016-12-06  5:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Hannes Frederic Sowa, Thomas Graf,
	Linux Kernel Network Developers, Daniel Borkmann,
	David S. Miller

On Mon, Dec 5, 2016 at 7:05 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Sun, Dec 04, 2016 at 05:05:28PM +0100, Hannes Frederic Sowa wrote:
>>
>> If one of those eBPF verifiers only accepts a certain number of INSN, as
>> fundamental as backwards jumps, we might end up with two compiler?
>
> two compilers? We already have five. There is gcc bpf backend (unmaintained)
> and now lua, python and ply project can generate bpf code without llvm.
> The kernel verifier has to become smarter. Right now it understands
> only certain instruction patterns which caused all five bpf generators to
> do extra work to satisfy the verifier. The solution is to do
> data flow analysis using proper compiler techniques.
>
>> program thinks). Ergo, more complexity. What do you do when one of those
>> two systems fail? What is the reference data? What do you do if on a
>> highly busy box during DoS constant reloading of your vmalloc happens (I
>> don't know if it is a problem under DoS)?
>
> ddos is one of the key use cases for xdp. If the system is about to oom
> during ddos, it has to be fixed. The faster we move with xdp development
> the sooner we will find and fix those issues.
> And xdp being a core component of the linux kernel we will fix ddos
> for the whole internet. Anyone going dpdk route are simply in
> business of selling ddos protection with proprietary solutions.
>
Hi Alexei,

I am wondering exactly how XDP fixes DDOS in a non-proprietary
fashion. While the XDP infrastructure is part of the core kernel, the
programs are not part of the kernel as you mention below. So what will
a DDOS solution based on XDP for the whole Internet look like? Do you
envision a set of "blessed" DDOS programs that various sites can use
and configure (maybe some maintained open source repository), or will
each site need to come up with their own XDP programs for DDOS?

Thanks,
Tom

>> I tried to argue that someone wanting to build netmap/DPDK-alike things
>> in XDP, one faces the problem of synchronized IPC. Hashmaps solve this
>> to some degree but cannot be synchronized.
>
> I don't see ipc as a problem and, yes, xdp is the best platform so far
> to deliver packets to user space. I think that the dataplane-in-the-driver
> is going to be faster than the fastest streaming to user space approach,
> but we cannot rule one way or the other without trying multiple
> approaches first and benchmarking them against each other.
> So I very much in favor of Jesper's effort to deliver packets to user space.
>
>> DPDK even can configure various hw offloads already before the kernel
>> can do so.
>
> that's a harsh lesson that the kernel needs to learn. Since people went
> to dpdk to do hw offload it means it's our fault that we were not
> accommodative and flexible enough to provide such frameworks within
> the kernel. imo John's flow/match api should have been accepted
> and it would have been solid building block towards such offloads.
>
>> If users want to use those, they switch to DPDK also, as I
>> have seen the industry always wanting the best performance. DPDK can use
>> SIMD instructions, all AVX, SSE and MMX stuff, and they do it.
>
> agree as well. The kernel needs to find a way to use all of these
> fancy instructions where performance matters.
> People who say "kernel cannot do simd" just didn't try hard enough.
>
>> Debugging is harder but currently worked on. But will probably always be
>> harder than simply using a debugger.
>
> That's actually the important value proposition of xdp+bpf, since
> non-working bpf program is not a concern for the kernel support team.
> Unlike kernel modules that the kernel team needs to bless and support
> in production, bpf programs are outside of that scope. They are part
> of user space apps and part of user space responsibility.
>
>> This all leads to gigantic user space control planes like neutron and
>> others that just make everyone's life much harder. The model requires
>> this. And that is what I fear.
>
> the neutron is complex and fragile, since it's using bridges on
> top of bridges with ebtables and ovs in the mix. Trying to manage
> many different kernel technologies and a mix of smaller control planes
> by this mega control plane is not an easy task.
>
>> I am not at all that negative against a hook before allocating the
>> packet, but making everyone using it and marketing as an alternative to
>> DPDK doesn't seem to fit for me.
>
> I don't see developers that are forced to use xdp. I see developers
> that are eager to use xdp as soon as support for it is available
> in their nics. Those like maglev who developed their own bypass
> are not going to use dpdk and people who already using dpdk are
> not going to switch to xdp, but there are lots of others who
> welcome xdp with open arms.
>
> Thanks
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [flamebait] xdp Was: Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-06  5:08                         ` Tom Herbert
@ 2016-12-06  6:04                           ` Alexei Starovoitov
  0 siblings, 0 replies; 40+ messages in thread
From: Alexei Starovoitov @ 2016-12-06  6:04 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Hannes Frederic Sowa, Thomas Graf,
	Linux Kernel Network Developers, Daniel Borkmann,
	David S. Miller

On Mon, Dec 05, 2016 at 09:08:36PM -0800, Tom Herbert wrote:
> On Mon, Dec 5, 2016 at 7:05 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Sun, Dec 04, 2016 at 05:05:28PM +0100, Hannes Frederic Sowa wrote:
> >>
> >> If one of those eBPF verifiers only accepts a certain number of INSN, as
> >> fundamental as backwards jumps, we might end up with two compiler?
> >
> > two compilers? We already have five. There is gcc bpf backend (unmaintained)
> > and now lua, python and ply project can generate bpf code without llvm.
> > The kernel verifier has to become smarter. Right now it understands
> > only certain instruction patterns which caused all five bpf generators to
> > do extra work to satisfy the verifier. The solution is to do
> > data flow analysis using proper compiler techniques.
> >
> >> program thinks). Ergo, more complexity. What do you do when one of those
> >> two systems fail? What is the reference data? What do you do if on a
> >> highly busy box during DoS constant reloading of your vmalloc happens (I
> >> don't know if it is a problem under DoS)?
> >
> > ddos is one of the key use cases for xdp. If the system is about to oom
> > during ddos, it has to be fixed. The faster we move with xdp development
> > the sooner we will find and fix those issues.
> > And xdp being a core component of the linux kernel we will fix ddos
> > for the whole internet. Anyone going dpdk route are simply in
> > business of selling ddos protection with proprietary solutions.
> >
> Hi Alexei,
> 
> I am wondering exactly how XDP fixes DDOS in a non-proprietary
> fashion. While the XDP infrastructure is part of the core kernel, the
> programs are not part of the kernel as you mention below. So what will
> a DDOS solution based on XDP for the whole Internet look like? Do you
> envision a set of "blessed" DDOS programs that various sites can use
> and configure (maybe some maintained open source repository), or will
> each site need to come up with their own XDP programs for DDOS?

At some point we would need a repository of these 'blessed' programs.
Some of them will not be programs, but program generators
similar to existing Cloudflare bpf setup:
https://github.com/cloudflare/bpftools
and instead of doing things like:
https://github.com/cloudflare/lua-aho-corasick
and reimplementing them in proprietary c++,
the dfa/aho-corasick will be implemented as a kernel helper.
That's what I was alluding to in
https://github.com/iovisor/bcc/issues/471
Then all of the research in that area like:
https://ir.nctu.edu.tw/bitstream/11536/26033/1/000288319400006.pdf
will be applicable and researchers will be sharing
these detector programs.
Of course, not everyone will open up their secret sauce,
but a lot of folks will do and it will drive the innovation.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: bpf bounded loops. Was: [flamebait] xdp
  2016-12-05 16:54                     ` Edward Cree
@ 2016-12-06 11:35                       ` Hannes Frederic Sowa
  0 siblings, 0 replies; 40+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-06 11:35 UTC (permalink / raw)
  To: Edward Cree, Alexei Starovoitov
  Cc: Tom Herbert, Thomas Graf, Linux Kernel Network Developers,
	Daniel Borkmann, David S. Miller

On 05.12.2016 17:54, Edward Cree wrote:
> On 05/12/16 16:50, Hannes Frederic Sowa wrote:
>> On 05.12.2016 17:40, Edward Cree wrote:
>>> I may be completely mistaken here, but can't the verifier unroll the loop 'for
>>> verification' without it actually being unrolled in the program?
>>> I.e., any "proof that the loop terminates" should translate into "rewrite of
>>> the directed graph to make it a DAG, possibly duplicating a lot of insns", and
>>> you feed the rewritten graph to the verifier, while using the original loopy
>>> version as the actual program to store and later execute.
>>> Then the verifier happily checks things like array indices being valid, without
>>> having to know about the bounded loops.
>> That is what is already happening. E.g. __builtin_memset is expanded up
>> to 128 rounds (which is a lot) but at some point llvm doesn't do enoug
>> unrolling of that.
>>
>> The BPF target configures that in
>> http://llvm.org/docs/doxygen/html/BPFISelLowering_8cpp_source.html on
>> line 166-169.
> I think you're talking about the _compiler_ unrolling loops before it
> submits the program to the kernel.  I'm talking about having the _verifier_
> unroll them, so that we can execute the original (non-unrolled) version.
> Or am I misunderstanding?

Ah, in the verifier this would be part of flow control analysis what we
are talking about in the other part of this thread.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2016-12-06 11:36 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-01  9:11 [flamebait] xdp, well meaning but pointless Florian Westphal
2016-12-01 13:42 ` Hannes Frederic Sowa
2016-12-01 14:58 ` Thomas Graf
2016-12-01 15:52   ` Hannes Frederic Sowa
2016-12-01 16:28     ` Thomas Graf
2016-12-01 20:44       ` Hannes Frederic Sowa
2016-12-01 21:12         ` Tom Herbert
2016-12-01 21:27           ` Hannes Frederic Sowa
2016-12-01 21:51             ` Tom Herbert
2016-12-02 10:24               ` Jesper Dangaard Brouer
2016-12-02 11:54                 ` Hannes Frederic Sowa
2016-12-02 16:59                   ` Tom Herbert
2016-12-02 18:12                     ` Hannes Frederic Sowa
2016-12-02 19:56                       ` Stephen Hemminger
2016-12-02 20:19                         ` Tom Herbert
2016-12-02 18:39             ` bpf bounded loops. Was: [flamebait] xdp Alexei Starovoitov
2016-12-02 19:25               ` Hannes Frederic Sowa
2016-12-02 19:42                 ` John Fastabend
2016-12-02 19:50                   ` Hannes Frederic Sowa
2016-12-03  0:20                   ` Alexei Starovoitov
2016-12-03  9:11                     ` Sargun Dhillon
2016-12-02 19:42                 ` Hannes Frederic Sowa
2016-12-02 23:34                   ` Alexei Starovoitov
2016-12-04 16:05                     ` [flamebait] xdp Was: " Hannes Frederic Sowa
2016-12-06  3:05                       ` Alexei Starovoitov
2016-12-06  5:08                         ` Tom Herbert
2016-12-06  6:04                           ` Alexei Starovoitov
2016-12-05 16:40                 ` Edward Cree
2016-12-05 16:50                   ` Hannes Frederic Sowa
2016-12-05 16:54                     ` Edward Cree
2016-12-06 11:35                       ` Hannes Frederic Sowa
2016-12-01 16:06   ` [flamebait] xdp, well meaning but pointless Florian Westphal
2016-12-01 16:19   ` David Miller
2016-12-01 16:51     ` Florian Westphal
2016-12-01 17:20     ` Hannes Frederic Sowa
     [not found] ` <CALx6S35R_ZStV=DbD-7Gf_y5xXqQq113_6m5p-p0GQfv46v0Ow@mail.gmail.com>
2016-12-01 18:02   ` Tom Herbert
2016-12-02 17:22 ` Jesper Dangaard Brouer
2016-12-03 16:19   ` Willem de Bruijn
2016-12-03 19:48     ` John Fastabend
2016-12-05 11:04       ` Jesper Dangaard Brouer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.