From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: [PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi ring Date: Wed, 1 Aug 2018 17:09:49 +0200 Message-ID: <20180801170949.5bf6101e@redhat.com> References: <1532947431-2737-1-git-send-email-makita.toshiaki@lab.ntt.co.jp> <1532947431-2737-5-git-send-email-makita.toshiaki@lab.ntt.co.jp> <20180731122603.27355719@redhat.com> <20180731144646.70e2171d@redhat.com> <90f355ef-1e56-5f12-ab78-a19c83fc9253@lab.ntt.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: Daniel Borkmann , netdev@vger.kernel.org, Jakub Kicinski , John Fastabend , "Karlsson, Magnus" , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , brouer@redhat.com To: Toshiaki Makita , Alexei Starovoitov Return-path: Received: from mx3-rdu2.redhat.com ([66.187.233.73]:57780 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2389529AbeHAQ4G (ORCPT ); Wed, 1 Aug 2018 12:56:06 -0400 In-Reply-To: <90f355ef-1e56-5f12-ab78-a19c83fc9253@lab.ntt.co.jp> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, 1 Aug 2018 14:41:08 +0900 Toshiaki Makita wrote: > On 2018/07/31 21:46, Jesper Dangaard Brouer wrote: > > On Tue, 31 Jul 2018 19:40:08 +0900 > > Toshiaki Makita wrote: > > > >> On 2018/07/31 19:26, Jesper Dangaard Brouer wrote: > >>> > >>> Context needed from: [PATCH v6 bpf-next 2/9] veth: Add driver XDP > >>> > >>> On Mon, 30 Jul 2018 19:43:44 +0900 > >>> Toshiaki Makita wrote: > >>> [...] > >>> > >>> Here you are adding an assumption that struct xdp_frame is always > >>> located in-the-top of the packet-data area. I tried hard not to add > >>> such a dependency! You can calculate the beginning of the frame from > >>> the xdp_frame->data pointer. > >>> > >>> Why not add such a dependency? Because for AF_XDP zero-copy, we cannot > >>> make such an assumption. > >>> > >>> Currently, when an RX-queue is in AF-XDP-ZC mode (MEM_TYPE_ZERO_COPY) > >>> the packet will get dropped when calling convert_to_xdp_frame(), but as > >>> the TODO comment indicated in convert_to_xdp_frame() this is not the > >>> end-goal. > >>> > >>> The comment in convert_to_xdp_frame(), indicate we need a full > >>> alloc+copy, but that is actually not necessary, if we can just use > >>> another memory area for struct xdp_frame, and a pointer to data. Thus, > >>> allowing devmap-redir to work-ZC and allow cpumap-redir to do the copy > >>> on the remote CPU. > >> > >> Thanks for pointing this out. > >> Seems you are saying xdp_frame area is not reusable. That means we > >> reduce usable headroom on every REDIRECT. I wanted to avoid this but > >> actually it is impossible, right? > > > > I'm not sure I understand fully... has this something to do, with the > > below memset? > > Sorry for not being so clear... > It has something to do with the memset as well but mainly I was talking > about XDP_TX and REDIRECT introduced in patch 8. On REDIRECT, > dev_map_enqueue() calls convert_to_xdp_frame() so we use the headroom > for struct xdp_frame on REDIRECT. If we don't reuse xdp_frame region of > the original xdp packet, we reduce the headroom size each time on > REDIRECT. When ZC is used, in the future xdp_frame can be non-contiguous > to the buffer, so we cannot reuse the xdp_frame region in > convert_to_xdp_frame()? But current convert_to_xdp_frame() > implementation requires xdp_frame region in headroom so I think I cannot > avoid this dependency now. > > SKB has a similar problem if we cannot reuse it. It can be passed to a > bridge and redirected to another veth which has driver XDP. In that case > we need to reallocate the page if we have reduced the headroom because > sufficient headroom is required for XDP processing for now (can we > remove this requirement actually?). Okay, now I understand. Your changes allow multiple levels of XDP_REDIRECT between/into other veth net_devices. This is very interesting and exciting stuff, but also a bit scary, when thinking about if we got he life-time correct for the different memory objects. You have convinced me. We should not sacrifice/reduce the headroom this way. I'll also fix up cpumap. To avoid the performance penalty of the memset, I propose that we just clear the xdp_frame->data pointer. But lets implement it via a common sanitize/scrub function. > > When cpumap generate an SKB for the netstack, then we sacrifice/reduce > > the SKB headroom available, by in convert_to_xdp_frame() reducing the > > headroom by xdp_frame size. > > > > xdp_frame->headroom = headroom - sizeof(*xdp_frame) > > > > In-order to avoid doing such memset of this area. We are actually only > > worried about exposing the 'data' pointer, thus we could just clear > > that. (See commit 6dfb970d3dbd, this is because Alexei is planing to > > move from CAP_SYS_ADMIN to lesser privileged mode CAP_NET_ADMIN) > > > > See commits: > > 97e19cce05e5 ("bpf: reserve xdp_frame size in xdp headroom") > > 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse") > > We have talked about that... > https://patchwork.ozlabs.org/patch/903536/ > > The memset is introduced as per your feedback, but I'm still not sure if > we need this. In general the headroom is not cleared after allocation in > drivers, so anyway unprivileged users should not see it no matter if it > contains xdp_frame or not... I actually got this request from Alexei. That is why I implemented it. Personally I don't think this clearing is really needed, until someone actually makes the TC/cls_act BPF hook CAP_NET_ADMIN. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer