From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexei Starovoitov Subject: Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter Date: Sat, 9 Apr 2016 10:00:02 -0700 Message-ID: <20160409170000.GA53526@ast-mbp.thefacebook.com> References: <1460090930-11219-1-git-send-email-bblanco@plumgrid.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Brenden Blanco , "David S. Miller" , Linux Kernel Network Developers , Or Gerlitz , Daniel Borkmann , Jesper Dangaard Brouer , Eric Dumazet , Edward Cree , john fastabend , Thomas Graf , Johannes Berg , eranlinuxmellanox@gmail.com, Lorenzo Colitti To: Tom Herbert Return-path: Received: from mail-pf0-f170.google.com ([209.85.192.170]:33177 "EHLO mail-pf0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753818AbcDIRAI (ORCPT ); Sat, 9 Apr 2016 13:00:08 -0400 Received: by mail-pf0-f170.google.com with SMTP id 184so95217236pff.0 for ; Sat, 09 Apr 2016 10:00:07 -0700 (PDT) Content-Disposition: inline In-Reply-To: Sender: netdev-owner@vger.kernel.org List-ID: On Sat, Apr 09, 2016 at 08:17:04AM -0300, Tom Herbert wrote: > > > > +/* user return codes for PHYS_DEV prog type */ > > +enum bpf_phys_dev_action { > > + BPF_PHYS_DEV_DROP, > > + BPF_PHYS_DEV_OK, > > I don't like OK. Maybe this mean LOCAL. We also need FORWARD (not sure > how to handle GRO yet). > > I would suggest that we format the return code as code:subcode, where > the above are codes. subcode is relevant to major code. For instance > in forwarding the subcodes indicate a forwarding instruction (maybe a > queue). DROP subcodes might answer why. for tc redirect we use hidden percpu variable to pass additional info together with return code. The cost of it is extra bpf_redirect() call. Here we can do better and embed such info for xmit, but subcodes for drop is slippery slop, since it's adding concepts to design that are not going to be used by everyone. If necessary bpf programs can count drop reasons internally. Drops due to protocol!=ipv6 or drops due to ip frags present will be program internal reasons. No need to expose them in api. We need to get xmit part implemented first and see how it looks before deciding on this part of api. Right now I think we do not need tx queue number actually. The prog should just return 'XMIT' and xdp(driver) side will decide which tx queue to use. > One other API issue is how to deal with encapsulation. In this case a > header may be prepended to the packet, I assume there are BPF helper > functions and we don't need to return a new length or start? a bit of history: for tc+bpf we've been trying to come up with clean helpers to do header push/pop and it was very difficult, since skb keeps a ton of metedata about header offsets, csum offsets, encap flag, etc we've lost the count on number of different approaches we've implemented and discarded. For XDP there is no such issue. Likely we'll have single bpf_packet_change(ctx, off, len) helper that will grow(len) or trim(-len) bytes at offset(off) in the packet. ctx->len will be adjusted automatically by the helper. The headroom, tailroom will not be exposed and will never be known to the bpf side. It's up to the helper and the driver to decide how to insert N bytes at offset M. If the driver reserved headroom in dma buffer, it can grow into it, if not it can grow tail and move the whole packet. For performance reasons we obviously want some headroom in dma buffer, but it's not exposed to bpf. But it could be that directly adjusting ctx->len and ctx->data is faster. For cls_bpf ctx->data is hidden and packet access is done via special instructions and helpers. For XDP we can hopefully do better and do packet access with direct loads. I outlined that plan in the previous thread.