[PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len
@ 2020-09-04  9:30 Jesper Dangaard Brouer
  2020-09-04 23:39 ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2020-09-04  9:30 UTC (permalink / raw)
  To: bpf
  Cc: Jesper Dangaard Brouer, netdev, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend

Multiple BPF-helpers that can manipulate/increase the size of the SKB uses
__bpf_skb_max_len() as the max-length. This function limit size against the
current net_device MTU (skb->dev->mtu).

Often packets gets redirected to another net_device, that can have a larger
MTU, and this is the MTU that should count. The MTU limiting at this stage
seems wrong and redundant as the netstack will handle MTU checking
elsewhere.

Redirecting into sockmap by sk_skb programs already skip this MTU check.
Keep what commit 0c6bc6e531a6 ("bpf: fix sk_skb programs without skb->dev
assigned") did, and limit the max_len to SKB_MAX_ALLOC.

Also notice that the max_len MTU check is already skipped for GRO SKBs
(skb_is_gso), in both bpf_skb_adjust_room() and bpf_skb_change_head().
Thus, it is clearly safe to remove this check.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 47eef9a0be6a..ec0ed107fa37 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3211,8 +3211,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,

 static u32 __bpf_skb_max_len(const struct sk_buff *skb)
 {
-	return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
-			  SKB_MAX_ALLOC;
+	return SKB_MAX_ALLOC;
 }

 BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len
  2020-09-04  9:30 [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len Jesper Dangaard Brouer
@ 2020-09-04 23:39 ` Jakub Kicinski
  2020-09-07 14:07   ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2020-09-04 23:39 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, netdev, Daniel Borkmann, Alexei Starovoitov, John Fastabend

On Fri, 04 Sep 2020 11:30:28 +0200 Jesper Dangaard Brouer wrote:
> @@ -3211,8 +3211,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>  
>  static u32 __bpf_skb_max_len(const struct sk_buff *skb)
>  {
> -	return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> -			  SKB_MAX_ALLOC;
> +	return SKB_MAX_ALLOC;
>  }
>  
>  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
> 

Looks familiar:

https://lore.kernel.org/netdev/20200420231427.63894-1-zenczykowski@gmail.com/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len
  2020-09-04 23:39 ` Jakub Kicinski
@ 2020-09-07 14:07   ` Jesper Dangaard Brouer
  2020-09-10 20:00     ` Maciej Żenczykowski
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2020-09-07 14:07 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: bpf, netdev, Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	brouer, Maciej Zenczykowski

On Fri, 4 Sep 2020 16:39:47 -0700
Jakub Kicinski <kuba@kernel.org> wrote:

> On Fri, 04 Sep 2020 11:30:28 +0200 Jesper Dangaard Brouer wrote:
> > @@ -3211,8 +3211,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
> >  
> >  static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> >  {
> > -	return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> > -			  SKB_MAX_ALLOC;
> > +	return SKB_MAX_ALLOC;
> >  }
> >  
> >  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
> >   
> 
> Looks familiar:
> https://lore.kernel.org/netdev/20200420231427.63894-1-zenczykowski@gmail.com/
> 

Great to see that others have proposed same fix before.  Unfortunately
it seems that the thread have died, and no patch got applied to
address this.  (Cc. Maze since he was "mull this over a bit more"...)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len
  2020-09-07 14:07   ` Jesper Dangaard Brouer
@ 2020-09-10 20:00     ` Maciej Żenczykowski
  2020-09-14 14:05       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Maciej Żenczykowski @ 2020-09-10 20:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jakub Kicinski, bpf, Linux NetDev, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend

All recent Android R common kernels are currently carrying the
following divergence from upstream:

https://android.googlesource.com/kernel/common/+/194a1bf09a7958551a9e2dc947bdfe3f8be8eca8%5E%21/

static u32 __bpf_skb_max_len(const struct sk_buff *skb)
 {
- return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
-  SKB_MAX_ALLOC;
+ if (skb_at_tc_ingress(skb) || !skb->dev)
+ return SKB_MAX_ALLOC;
+ return skb->dev->mtu + skb->dev->hard_header_len;
 }

There wasn't agreement on how to handle this upstream because some
folks thought this check was useful...
Myself - I'm not entirely certain...
I'd like to be able to test for (something like) this, yes, but the
way it's done now is kind of pointless...
It breaks for gso packets anyway - it's not true that a gso packet can
just ignore the mtu check, you do actually need to check individual
gso segments are sufficiently small...
You need to check against the right interface, which again in the
presence of bpf redirect it currently utterly fails.
Checking on receive just doesn't seem useful, so what if I want to
increase packet size that arrives at the stack?
I also don't understand where SKB_MAX_ALLOC even comes from... skb's
on lo/veth can be 64KB not SKB_MAX_ALLOC (which ifirc is 16KB).

I think maybe there's now sufficient access to skb->len &
gso_segs/size to implement this in bpf instead of relying on the
kernel checking it???
But that might be slow...

It sounded like it was trending towards some sort of larger scale refactoring.

I haven't had the opportunity to take another look at this since then.
I'm not at all sure what would break if we just utterly deleted these
pkt too big > mtu checks.

In general in my experience bpf poorly handles gso and mtu and this is
an area in need of improvement.
I've been planning to get around to this, but am currently busy with a
bazillion other higher priority things :-(
Like trying to figure out whether XDP is even usable with real world
hardware limitations (currently the answer is still leaning towards
no, though there was some slightly positive news in the past few
days).  And whether we can even reach our performance goals with
jit'ed bpf... or do we need to just write it in kernel C... :-(

On Mon, Sep 7, 2020 at 7:08 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> On Fri, 4 Sep 2020 16:39:47 -0700
> Jakub Kicinski <kuba@kernel.org> wrote:
>
> > On Fri, 04 Sep 2020 11:30:28 +0200 Jesper Dangaard Brouer wrote:
> > > @@ -3211,8 +3211,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
> > >
> > >  static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> > >  {
> > > -   return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> > > -                     SKB_MAX_ALLOC;
> > > +   return SKB_MAX_ALLOC;
> > >  }
> > >
> > >  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
> > >
> >
> > Looks familiar:
> > https://lore.kernel.org/netdev/20200420231427.63894-1-zenczykowski@gmail.com/
> >
>
> Great to see that others have proposed same fix before.  Unfortunately
> it seems that the thread have died, and no patch got applied to
> address this.  (Cc. Maze since he was "mull this over a bit more"...)
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len
  2020-09-10 20:00     ` Maciej Żenczykowski
@ 2020-09-14 14:05       ` Jesper Dangaard Brouer
  2020-09-14 20:50         ` Maciej Żenczykowski
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2020-09-14 14:05 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Jakub Kicinski, bpf, Linux NetDev, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, brouer


Hi Maze,

Thanks for getting back to me, I appreciate that a lot.
More inline below:

On Thu, 10 Sep 2020 13:00:12 -0700
Maciej Żenczykowski <maze@google.com> wrote:

> All recent Android R common kernels are currently carrying the
> following divergence from upstream:
> 
> https://android.googlesource.com/kernel/common/+/194a1bf09a7958551a9e2dc947bdfe3f8be8eca8%5E%21/
> 
> static u32 __bpf_skb_max_len(const struct sk_buff *skb)
>  {
> - return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> -  SKB_MAX_ALLOC;
> + if (skb_at_tc_ingress(skb) || !skb->dev)
> + return SKB_MAX_ALLOC;
> + return skb->dev->mtu + skb->dev->hard_header_len;
>  }

Thanks for sharing that Android now have this out-of-tree patch. I'm
obviously annoyed that this was not upstreamed, as it hurts both you
and me, but we do live in an imperfect world ;)


> There wasn't agreement on how to handle this upstream because some
> folks thought this check was useful...
> Myself - I'm not entirely certain...
> I'd like to be able to test for (something like) this, yes, but the
> way it's done now is kind of pointless...
> It breaks for gso packets anyway - it's not true that a gso packet can
> just ignore the mtu check, you do actually need to check individual
> gso segments are sufficiently small...
> You need to check against the right interface, which again in the
> presence of bpf redirect it currently utterly fails.

I agree that the current check is done against the wrong interface.

> Checking on receive just doesn't seem useful, so what if I want to
> increase packet size that arrives at the stack?

It seems very practical to allow increase packet size of received
packet, also for local netstack deliver.  (e.g allowing to add encap
headers, without being limited to RX device MTU).


> I also don't understand where SKB_MAX_ALLOC even comes from... skb's
> on lo/veth can be 64KB not SKB_MAX_ALLOC (which ifirc is 16KB).

It was John that added the 16KiB SKB_MAX_ALLOC limit...
Why this value John?


> I think maybe there's now sufficient access to skb->len &
> gso_segs/size to implement this in bpf instead of relying on the
> kernel checking it???
> But that might be slow...
> 
> It sounded like it was trending towards some sort of larger scale refactoring.
> 
> I haven't had the opportunity to take another look at this since then.
> I'm not at all sure what would break if we just utterly deleted these
> pkt too big > mtu checks.

I'm looking at the code, and TC-ingress redirect to TC-egress and
following code into driver (ixgbe) it does look like we don't have
anything that limit/check the MTU before sending it out the driver (and
the specific driver also didn't limit this).

Thus, I think this patch is not enough on its own.  We/I likely need to
move the MTU check (instead of simply removing it), but based on the
egress device, and not the ingress device.  I will look more into this.


> In general in my experience bpf poorly handles gso and mtu and this is
> an area in need of improvement.
> I've been planning to get around to this, but am currently busy with a
> bazillion other higher priority things :-(
>
> Like trying to figure out whether XDP is even usable with real world
> hardware limitations (currently the answer is still leaning towards
> no, though there was some slightly positive news in the past few
> days).

Getting XDP support in all the different Android drivers seems like an
impossible task.  And you don't want to use generic-XDP, because it
will very likely cause a SKB re-allocation and copy of the data.

I think TC-BPF will likely be the better choice in the Android ecosystem.


> And whether we can even reach our performance goals with
> jit'ed bpf... or do we need to just write it in kernel C... :-(

My experience is that Jit'ed BPF code is super fast, also for the ARM
64-bit experiments:

 https://github.com/xdp-project/xdp-project/tree/master/areas/arm64

--Jesper

 
> On Mon, Sep 7, 2020 at 7:08 AM Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> > On Fri, 4 Sep 2020 16:39:47 -0700
> > Jakub Kicinski <kuba@kernel.org> wrote:
> >  
> > > On Fri, 04 Sep 2020 11:30:28 +0200 Jesper Dangaard Brouer wrote:  
> > > > @@ -3211,8 +3211,7 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
> > > >
> > > >  static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> > > >  {
> > > > -   return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> > > > -                     SKB_MAX_ALLOC;
> > > > +   return SKB_MAX_ALLOC;
> > > >  }
> > > >
> > > >  BPF_CALL_4(bpf_skb_adjust_room, struct sk_buff *, skb, s32, len_diff,
> > > >  
> > >
> > > Looks familiar:
> > > https://lore.kernel.org/netdev/20200420231427.63894-1-zenczykowski@gmail.com/
> > >  
> >
> > Great to see that others have proposed same fix before.  Unfortunately
> > it seems that the thread have died, and no patch got applied to
> > address this.  (Cc. Maze since he was "mull this over a bit more"...)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len
  2020-09-14 14:05       ` Jesper Dangaard Brouer
@ 2020-09-14 20:50         ` Maciej Żenczykowski
  2020-09-15  8:47           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 9+ messages in thread
From: Maciej Żenczykowski @ 2020-09-14 20:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jakub Kicinski, bpf, Linux NetDev, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend

On Mon, Sep 14, 2020 at 7:05 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
>
> Hi Maze,
>
> Thanks for getting back to me, I appreciate that a lot.
> More inline below:
>
> On Thu, 10 Sep 2020 13:00:12 -0700
> Maciej Żenczykowski <maze@google.com> wrote:
>
> > All recent Android R common kernels are currently carrying the
> > following divergence from upstream:
> >
> > https://android.googlesource.com/kernel/common/+/194a1bf09a7958551a9e2dc947bdfe3f8be8eca8%5E%21/
> >
> > static u32 __bpf_skb_max_len(const struct sk_buff *skb)
> >  {
> > - return skb->dev ? skb->dev->mtu + skb->dev->hard_header_len :
> > -  SKB_MAX_ALLOC;
> > + if (skb_at_tc_ingress(skb) || !skb->dev)
> > + return SKB_MAX_ALLOC;
> > + return skb->dev->mtu + skb->dev->hard_header_len;
> >  }
>
> Thanks for sharing that Android now have this out-of-tree patch. I'm
> obviously annoyed that this was not upstreamed, as it hurts both you
> and me, but we do live in an imperfect world ;)

Yeah... it was too late in the dev cycle to reach consensus... and we
needed to ship. :-(
Since we strictly control the ebpf code I'm not (really) worried about
this exposing bugs.

(btw. besides there's another gso bpf 4to6/6to4 translation bug that's
already exposed and needs fixing,
problem is fix is not backwards compatible - so need to write an essay
about how stupid the current code
is and why it must be changed, even while potentially breaking userspace...
quick summary: the gso adjustments in 4to6/6to4 conversion funcs
should be outright deleted)

> > There wasn't agreement on how to handle this upstream because some
> > folks thought this check was useful...
> > Myself - I'm not entirely certain...
> > I'd like to be able to test for (something like) this, yes, but the
> > way it's done now is kind of pointless...
> > It breaks for gso packets anyway - it's not true that a gso packet can
> > just ignore the mtu check, you do actually need to check individual
> > gso segments are sufficiently small...
> > You need to check against the right interface, which again in the
> > presence of bpf redirect it currently utterly fails.
>
> I agree that the current check is done against the wrong interface.
>
> > Checking on receive just doesn't seem useful, so what if I want to
> > increase packet size that arrives at the stack?
>
> It seems very practical to allow increase packet size of received
> packet, also for local netstack deliver.  (e.g allowing to add encap
> headers, without being limited to RX device MTU).
>
>
> > I also don't understand where SKB_MAX_ALLOC even comes from... skb's
> > on lo/veth can be 64KB not SKB_MAX_ALLOC (which ifirc is 16KB).
>
> It was John that added the 16KiB SKB_MAX_ALLOC limit...
> Why this value John?
>
>
> > I think maybe there's now sufficient access to skb->len &
> > gso_segs/size to implement this in bpf instead of relying on the
> > kernel checking it???
> > But that might be slow...
> >
> > It sounded like it was trending towards some sort of larger scale refactoring.
> >
> > I haven't had the opportunity to take another look at this since then.
> > I'm not at all sure what would break if we just utterly deleted these
> > pkt too big > mtu checks.
>
> I'm looking at the code, and TC-ingress redirect to TC-egress and
> following code into driver (ixgbe) it does look like we don't have
> anything that limit/check the MTU before sending it out the driver (and
> the specific driver also didn't limit this).
>
> Thus, I think this patch is not enough on its own.  We/I likely need to
> move the MTU check (instead of simply removing it), but based on the
> egress device, and not the ingress device.  I will look more into this.

Yes, I think I agree that checking pkt > mtu (also for gso!) should be
done in some core 'just prior' to driver location.
However... this has the issue of being too late to return a clean
error to the bpf program itself :-(
Oh well?  Maybe we don't care?

> > In general in my experience bpf poorly handles gso and mtu and this is
> > an area in need of improvement.
> > I've been planning to get around to this, but am currently busy with a
> > bazillion other higher priority things :-(
> >
> > Like trying to figure out whether XDP is even usable with real world
> > hardware limitations (currently the answer is still leaning towards
> > no, though there was some slightly positive news in the past few
> > days).
>
> Getting XDP support in all the different Android drivers seems like an
> impossible task.  And you don't want to use generic-XDP, because it
> will very likely cause a SKB re-allocation and copy of the data.
>
> I think TC-BPF will likely be the better choice in the Android ecosystem.

Yeah currently it looks like our bottlenecks are full payload data
memcpy's in the cell and usb drivers (they're stupid!!!), combined
with the skb alloc/zero/free overhead costs.  Switching to XDP would
eliminate the latter...
But we'd end up with XDP with 2 data payload copies (into the XDP
frame in the cell driver, and out of the XDP frame in the usb driver).
So XDP and yet not.  Since the cell driver (at least 2 different ones
I've looked at) already does a full payload copy... this wouldn't
necessarily be any worse then it is now for normal stack destined
traffic.

I see some potential crazy workarounds:

Only run XDP on the first ~cacheline of the packet, so it can look at
ether/ip/tcp headers but not data - and modify (& prepend to) this
header.
We don't need data access anyway(*), add a new special driver transmit
function that takes a pointer/length to two read-only buffers -
without any memory ownership (ie. copy out prior to func returning).
With this approach I'm down to one copy of payload (2 copies of ip
headers)

* potential issue: we'd like to be able to do ipv6 udp checksum
calculation (ipv4 udp checksum 0 is valid, for ipv6 it is not, so
translation ipv4 to ipv6 udp requires it), which does require access
to full payload, but this could be hacked via partial checksum
implementation.
In this case the special xmit func would need to look like:
  ndp_xdp_light_xmit(const void* ptr1, u16 len1, const void* ptr2, u16
len2, u16 csum_start, u16 csum_offset, [int flags])

I'm also still trying to figure out whether the cell memory model and
usb/wifi hw capabilities would allow full zerocopy xdp, but I'm
virtually certain the answer is not on all hardware (requires at least
scatter gather capable usb controller, and cellular modem with full
RAM DMA access, and it looks like there's various DMA32 style
restrictions here, sometimes even more like DMA26...).

I don't currently understand the XDP memory model, so any description
there-of would be useful.  Does it require 4KB or can 2KB pieces be
used?  (remember I'm really only interested in XDP redirect across
drivers cell/wifi/usb gadget/usb dongle/ethernet/bluetooth)  How much
headroom/tailroom is needed (i'd need extra headroom for header
insertion/modification myself: how can I force the receiving
driver/nic to provide enough headroom, up to 40 ipv6 - 20 ipv4 + 8
ipv6 frag + 14 ethernet = 42 bytes, possibly 4 more for vlan, possibly
a few bytes more for tricks with scatter gather in usb ncm driver,
assuming zerocopy is doable)?  How should the memory be (ideally)
allocated/deallocated?  Is there a de-alloc callback?  How does that
work with conversion from xdp to skb, where skb free has no callback?
It might actually be ok, to force a payload memcpy during xdp->skb
upconversion...

> > And whether we can even reach our performance goals with
> > jit'ed bpf... or do we need to just write it in kernel C... :-(
>
> My experience is that Jit'ed BPF code is super fast, also for the ARM
> 64-bit experiments:
>
>  https://github.com/xdp-project/xdp-project/tree/master/areas/arm64

Would you happen to know what ebpf startup overhead is?
How big a problem is having two (or more) back to back tc programs
instead of one?
We're running into both verifier performance scaling problems and code
ownership issues with large programs...

[btw. I understand for XDP we could only use 1 program anyway...]

Hoping you'll do my work for me :-) or at least answer some of the
above questions to the best of your knowledge...

- Maciej

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len
  2020-09-14 20:50         ` Maciej Żenczykowski
@ 2020-09-15  8:47           ` Toke Høiland-Jørgensen
  2020-09-16  0:12             ` Maciej Żenczykowski
  0 siblings, 1 reply; 9+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-09-15  8:47 UTC (permalink / raw)
  To: Maciej Żenczykowski, Jesper Dangaard Brouer
  Cc: Jakub Kicinski, bpf, Linux NetDev, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend

[ just jumping in to answer this bit: ]

> Would you happen to know what ebpf startup overhead is?
> How big a problem is having two (or more) back to back tc programs
> instead of one?

With a jit'ed BPF program and the in-kernel dispatcher code (which
avoids indirect calls), it's quite close to a native function call.

> We're running into both verifier performance scaling problems and code
> ownership issues with large programs...
>
> [btw. I understand for XDP we could only use 1 program anyway...]

Working on that! See my talk at LPC:
https://linuxplumbersconf.org/event/7/contributions/671/

Will post a follow-up to the list once the freplace multi-attach series
lands.

-Toke


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len
  2020-09-15  8:47           ` Toke Høiland-Jørgensen
@ 2020-09-16  0:12             ` Maciej Żenczykowski
  2020-09-16 11:37               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 9+ messages in thread
From: Maciej Żenczykowski @ 2020-09-16  0:12 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, Jakub Kicinski, bpf, Linux NetDev,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend

On Tue, Sep 15, 2020 at 1:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> [ just jumping in to answer this bit: ]
>
> > Would you happen to know what ebpf startup overhead is?
> > How big a problem is having two (or more) back to back tc programs
> > instead of one?
>
> With a jit'ed BPF program and the in-kernel dispatcher code (which
> avoids indirect calls), it's quite close to a native function call.

Hmm, I know we have (had? they're upstream now I think) some CFI vs
BPF interaction issues.
We needed to mark the BPF call into JIT'ed code as CFI exempt.

CFI is Code Flow Integrity and is some compiler magic, to quote wikipedia:
Google has shipped Android with the Linux kernel compiled by Clang
with link-time optimization (LTO) and CFI since 2018.[12]
I don't know much more about it.

But we do BPF_JIT_ALWAYS_ON on 64-bit kernels, so it sounds like we
might be good.

> > We're running into both verifier performance scaling problems and code
> > ownership issues with large programs...
> >
> > [btw. I understand for XDP we could only use 1 program anyway...]
>
> Working on that! See my talk at LPC:
> https://linuxplumbersconf.org/event/7/contributions/671/

Yes, I'm aware and excited about it!

Unfortunately, Android S will only support 4.19, 5.4 and 5.10 for
newly launched devices (and 4.9/4.14 for upgrades).
(5.10 here means 'whatever is the next 5.x LTS', but that's most likely 5.10)
I don't (yet) even have real phone hardware running 5.4, and 5.10
within the next year is even more of a stretch.

> Will post a follow-up to the list once the freplace multi-attach series
> lands.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len
  2020-09-16  0:12             ` Maciej Żenczykowski
@ 2020-09-16 11:37               ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 9+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-09-16 11:37 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Jesper Dangaard Brouer, Jakub Kicinski, bpf, Linux NetDev,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend

Maciej Żenczykowski <maze@google.com> writes:

> On Tue, Sep 15, 2020 at 1:47 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> [ just jumping in to answer this bit: ]
>>
>> > Would you happen to know what ebpf startup overhead is?
>> > How big a problem is having two (or more) back to back tc programs
>> > instead of one?
>>
>> With a jit'ed BPF program and the in-kernel dispatcher code (which
>> avoids indirect calls), it's quite close to a native function call.
>
> Hmm, I know we have (had? they're upstream now I think) some CFI vs
> BPF interaction issues.
> We needed to mark the BPF call into JIT'ed code as CFI exempt.
>
> CFI is Code Flow Integrity and is some compiler magic, to quote wikipedia:
> Google has shipped Android with the Linux kernel compiled by Clang
> with link-time optimization (LTO) and CFI since 2018.[12]
> I don't know much more about it.
>
> But we do BPF_JIT_ALWAYS_ON on 64-bit kernels, so it sounds like we
> might be good.

No idea about the CFI thing...

>> > We're running into both verifier performance scaling problems and code
>> > ownership issues with large programs...
>> >
>> > [btw. I understand for XDP we could only use 1 program anyway...]
>>
>> Working on that! See my talk at LPC:
>> https://linuxplumbersconf.org/event/7/contributions/671/
>
> Yes, I'm aware and excited about it!

Great! :)

> Unfortunately, Android S will only support 4.19, 5.4 and 5.10 for
> newly launched devices (and 4.9/4.14 for upgrades).
> (5.10 here means 'whatever is the next 5.x LTS', but that's most likely 5.10)
> I don't (yet) even have real phone hardware running 5.4, and 5.10
> within the next year is even more of a stretch.

Right, I saw your talk at LPC and of course the kernel version thing is
a bit of an issue. I suppose you could do some compile-time magic to
wrap programs and use the tail-call-based chaining for older kernels -
bit of a hassle, though :/

-Toke


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-09-16 20:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-04  9:30 [PATCH bpf-next] bpf: don't check against device MTU in __bpf_skb_max_len Jesper Dangaard Brouer
2020-09-04 23:39 ` Jakub Kicinski
2020-09-07 14:07   ` Jesper Dangaard Brouer
2020-09-10 20:00     ` Maciej Żenczykowski
2020-09-14 14:05       ` Jesper Dangaard Brouer
2020-09-14 20:50         ` Maciej Żenczykowski
2020-09-15  8:47           ` Toke Høiland-Jørgensen
2020-09-16  0:12             ` Maciej Żenczykowski
2020-09-16 11:37               ` Toke Høiland-Jørgensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).