From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0D307C169C4 for ; Wed, 6 Feb 2019 03:56:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CE8F0218A1 for ; Wed, 6 Feb 2019 03:56:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=fomichev-me.20150623.gappssmtp.com header.i=@fomichev-me.20150623.gappssmtp.com header.b="1EQTqKHJ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727510AbfBFD4W (ORCPT ); Tue, 5 Feb 2019 22:56:22 -0500 Received: from mail-pf1-f194.google.com ([209.85.210.194]:37616 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726230AbfBFD4W (ORCPT ); Tue, 5 Feb 2019 22:56:22 -0500 Received: by mail-pf1-f194.google.com with SMTP id y126so2499054pfb.4 for ; Tue, 05 Feb 2019 19:56:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fomichev-me.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=uEWQlUQrZOhnczzK6gKrcfUZd2heNfQ5LTXmVrXDCTs=; b=1EQTqKHJe36Or9RCFM/gFPtSzDmPfq/OdTaHa4THpFm1NYwfcp8QPhyrgUIJSb9dd/ 2VMCkyk6sAB8yUnCQxUDO8V5i1GpvWE6wq7oGL/s6EvV/Gp1fWP5seNA+AAIY0nwlbCz xBv41kgZgGvBwqQfIwLvUzqllKvTCdejhFzq1VJaLcHKvWzBxev5cE46J5qt0DRkka+F Al6PmqQDBGhpnlX2z5HFFlmPvrrLqtAZpW34uFQSWDn+CWvd6+XjXGJmR/dLqf/JFfR8 5uqE6uBzeqmMLL+zCzSK5nVUTfbn8BNr9F4twYt+AIMDaY95t/b28tvtMx+EyeC9K+gA 2ssQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=uEWQlUQrZOhnczzK6gKrcfUZd2heNfQ5LTXmVrXDCTs=; b=YWhPqn94j8+HXzbBCT2+en75u2nSoW5UdD31z1YwhIl3vO7EezSqu+p27oZQKPar/R 7iT4Rowe8Ic+lm9uY+myOJT60Hh56nZ1g5OLeAbANrD1Wz0pAANXumAsn7Q9lUoMGJdz fT4m3lxJmMTh7oMnm0ZfQIP6JOn6V4dol4X5rsoCosxWRfCIT6cInhUpStCYuBmQiEP5 1oGhRXvA7Pu+5r43ACh//zr7Mwg3rAJEgbYbse5YocRigOtSqkhn43N1bpZtg+k7uqt+ NHVpK/pqTVjqeinLASXg5e8sIeEhpLex345vwwi0EqEm/qTBknUTYMxcA8VvZKZQxLfY QXxA== X-Gm-Message-State: AHQUAuZxKeHvmNmpdazqe+JLItJKNfzoiELOPQ1H3e8n4QNv0obNb/PH ps3bKytON+OBn9gr/yv7fe0XSCBpFSk= X-Google-Smtp-Source: AHgI3IZrYY+/++AD8LaznfGFcroHK5jDR7cZnnnz42CoD0DIGCwNNNYghQrGlbOvdv4mtL0cBt5+ew== X-Received: by 2002:a63:dc54:: with SMTP id f20mr7765367pgj.410.1549425381265; Tue, 05 Feb 2019 19:56:21 -0800 (PST) Received: from localhost ([2601:646:8f00:18d9:d0fa:7a4b:764f:de48]) by smtp.gmail.com with ESMTPSA id z186sm7630500pfz.119.2019.02.05.19.56.20 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 05 Feb 2019 19:56:20 -0800 (PST) Date: Tue, 5 Feb 2019 19:56:19 -0800 From: Stanislav Fomichev To: Alexei Starovoitov Cc: Willem de Bruijn , Stanislav Fomichev , Network Development , David Miller , Alexei Starovoitov , Daniel Borkmann , simon.horman@netronome.com, Willem de Bruijn Subject: Re: [RFC bpf-next 0/7] net: flow_dissector: trigger BPF hook when called from eth_get_headlen Message-ID: <20190206035619.GG10769@mini-arch> References: <20190205173629.160717-1-sdf@google.com> <20190205204003.GB10769@mini-arch> <20190206004714.pz44evow5uwgvt4x@ast-mbp.dhcp.thefacebook.com> <20190206005931.GF10769@mini-arch> <20190206031215.qldeh7pfgqr3frg3@ast-mbp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190206031215.qldeh7pfgqr3frg3@ast-mbp> User-Agent: Mutt/1.11.2 (2019-01-07) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On 02/05, Alexei Starovoitov wrote: > On Tue, Feb 05, 2019 at 04:59:31PM -0800, Stanislav Fomichev wrote: > > On 02/05, Alexei Starovoitov wrote: > > > On Tue, Feb 05, 2019 at 12:40:03PM -0800, Stanislav Fomichev wrote: > > > > On 02/05, Willem de Bruijn wrote: > > > > > On Tue, Feb 5, 2019 at 12:57 PM Stanislav Fomichev wrote: > > > > > > > > > > > > Currently, when eth_get_headlen calls flow dissector, it doesn't pass any > > > > > > skb. Because we use passed skb to lookup associated networking namespace > > > > > > to find whether we have a BPF program attached or not, we always use > > > > > > C-based flow dissector in this case. > > > > > > > > > > > > The goal of this patch series is to add new networking namespace argument > > > > > > to the eth_get_headlen and make BPF flow dissector programs be able to > > > > > > work in the skb-less case. > > > > > > > > > > > > The series goes like this: > > > > > > 1. introduce __init_skb and __init_skb_shinfo; those will be used to > > > > > > initialize temporary skb > > > > > > 2. introduce skb_net which can be used to get networking namespace > > > > > > associated with an skb > > > > > > 3. add new optional network namespace argument to __skb_flow_dissect and > > > > > > plumb through the callers > > > > > > 4. add new __flow_bpf_dissect which constructs temporary on-stack skb > > > > > > (using __init_skb) and calls BPF flow dissector program > > > > > > > > > > The main concern I see with this series is this cost of skb zeroing > > > > > for every packet in the device driver receive routine, *independent* > > > > > from the real skb allocation and zeroing which will likely happen > > > > > later. > > > > Yes, plus ~200 bytes on the stack for the callers. > > > > > > > > Not sure how visible this zeroing though, I can probably try to get some > > > > numbers from BPF_PROG_TEST_RUN (running current version vs running with > > > > on-stack skb). > > > > > > imo extra 256 byte memset for every packet is non starter. > > We can put pre-allocated/initialized skbs without data into percpu or even > > use pcpu_freelist_pop/pcpu_freelist_push to make sure we don't have to think > > about having multiple percpu for irq/softirq/process contexts. > > Any concerns with that approach? > > Any other possible concerns with the overall series? > > I'm missing why the whole thing is needed. > You're saying: > " make BPF flow dissector programs be able to work in the skb-less case". > What does it mean specifically? > The only non-skb case is XDP. > Are you saying you want flow_dissector prog to be run in XDP? eth_get_headlen that drivers call on RX path on a chunk of data to guesstimate the length of the headers calls flow dissector without an skb (__skb_flow_dissect was a weird interface where it accepts skb or data+len). Right now, there is no way to trigger BPF flow dissector for this case (we don't have an skb to get associated namespace/etc/etc). The patch series tries to fix that to make sure that we always trigger BPF program if it's attached to a device's namespace. Context: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ba38943ba190eb6a494262003e23187d1b40fb4