All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <daniel@iogearbox.net>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org, xdp-newbies@vger.kernel.org
Subject: Re: [PATCH v2 net-next RFC] Generic XDP
Date: Mon, 10 Apr 2017 21:50:51 +0200	[thread overview]
Message-ID: <58EBE21B.5000602@iogearbox.net> (raw)
In-Reply-To: <20170410021807.GA17150@ast-mbp.thefacebook.com>

On 04/10/2017 04:18 AM, Alexei Starovoitov wrote:
[...]
>> +	xdp.data_end = xdp.data + hlen;
>> +	xdp.data_hard_start = xdp.data - skb_headroom(skb);
>> +	orig_data = xdp.data;
>> +	act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> +
>> +	off = xdp.data - orig_data;
>> +	if (off)
>> +		__skb_push(skb, off);
>
> and restore l2 back somehow and get new skb->protocol ?
> if we simply do __skb_pull(skb, skb->mac_len); like
> we do with cls_bpf, it will not work correctly,
> since if the program did ip->ipip encap (like our balancer
> does and the test tools/testing/selftests/bpf/test_xdp.c)
> the skb metadata fields will be wrong.
> So we need to repeat eth_type_trans() here if (xdp.data != orig_data)

Yeah, agree. Also, when we have gso skb and rewrite/resize parts
of the packet, we would need to update gso related shinfo meta
data accordingly (f.e. a rewrite from v4/v6, rewrite of whole pkt
as icmp reply, etc)?

Also, what about encap/decap, should inner skb headers get
updated as well along with skb->encapsulation, etc? How do we
handle checksumming on this layer?

> In case of cls_bpf when we mess with skb sizes we always
> adjust skb metafields in helpers, so there it's fine
> and __skb_pull(skb, skb->mac_len); is enough.
> Here we need to be a bit more careful.

In cls_bpf I was looking into something generic and fast for
encap/decap like bpf_xdp_adjust_head() but for skbs. Problem is
that they can be received from ingress/egress and transmitted
further from cls_bpf to ingress/egress, so keeping skb meta data
correct and up to date without exposing skb (implementation)
details like header pointers to users is crucial, as otherwise
these can get messed up potentially affecting the rest of the
system. We restricted helpers in cls_bpf to avoid that. Perhaps
we could make easier assumptions when this generic callback is
known to be called out of a physical driver's rx path, but when
being skb already (as mentioned below by Alexei's thoughts) ...

>>   static int netif_receive_skb_internal(struct sk_buff *skb)
>>   {
>>   	int ret;
>> @@ -4258,6 +4336,21 @@ static int netif_receive_skb_internal(struct sk_buff *skb)
>>
>>   	rcu_read_lock();
>>
>> +	if (static_key_false(&generic_xdp_needed)) {
>> +		struct bpf_prog *xdp_prog = rcu_dereference(skb->dev->xdp_prog);
>> +
>> +		if (xdp_prog) {
>> +			u32 act = netif_receive_generic_xdp(skb, xdp_prog);
>
> That's indeed the best attachment point in the stack.
> I was trying to see whether it can be lowered into something like
> dev_gro_receive(), but not everyone calls it.
> Another option to put it into eth_type_trans() itself, then
> there are no problems with gro, l2 headers, and adjust_head,
> but changing all drivers is too much.
>
>> +
>> +			if (act != XDP_PASS) {
>> +				rcu_read_unlock();
>> +				if (act == XDP_TX)
>> +					dev_queue_xmit(skb);
>
> It should be fine. For cls_bpf we do recursion check __bpf_tx_skb()
> but I forgot specific details. May be here it's fine as-is.
> Daniel, do we need recursion check here?

Yeah, Willem is correct. That was for sch_handle_egress() to
sch_handle_egress() as that is otherwise not accounted by the
main xmit_recursion check we have in __dev_queue_xmit().

Thanks,
Daniel

  parent reply	other threads:[~2017-04-10 19:51 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-09 20:35 [PATCH v2 net-next RFC] Generic XDP David Miller
2017-04-10  2:18 ` Alexei Starovoitov
2017-04-10 16:57   ` Willem de Bruijn
2017-04-10 19:33   ` David Miller
2017-04-10 19:50   ` Daniel Borkmann [this message]
2017-04-10 18:39 ` Andy Gospodarek
2017-04-10 19:28   ` David Miller
2017-04-10 21:30     ` Andy Gospodarek
2017-04-10 21:47       ` Michael Chan
2017-04-11  0:56         ` David Miller
2017-04-10 19:34   ` David Miller
2017-04-10 21:33     ` Andy Gospodarek
2017-04-10 20:12   ` Daniel Borkmann
2017-04-10 21:41     ` Andy Gospodarek
2017-04-11 16:05       ` Eric Dumazet
2017-04-11 16:12         ` Eric Dumazet
2017-04-10 19:28 ` Stephen Hemminger
2017-04-10 21:08 ` Daniel Borkmann
2017-04-11 16:28 ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=58EBE21B.5000602@iogearbox.net \
    --to=daniel@iogearbox.net \
    --cc=alexei.starovoitov@gmail.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=xdp-newbies@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.