From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9154FC43381 for ; Tue, 26 Feb 2019 11:00:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6820120842 for ; Tue, 26 Feb 2019 11:00:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727378AbfBZLAf convert rfc822-to-8bit (ORCPT ); Tue, 26 Feb 2019 06:00:35 -0500 Received: from mail-lj1-f194.google.com ([209.85.208.194]:40219 "EHLO mail-lj1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725908AbfBZLAe (ORCPT ); Tue, 26 Feb 2019 06:00:34 -0500 Received: by mail-lj1-f194.google.com with SMTP id w6so10353278ljd.7 for ; Tue, 26 Feb 2019 03:00:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=A9A18RYsUY98/uThrOoESlgGIoXRtPXmsjsqH8sdKkQ=; b=s8dplM5sc3M1fgXTrKku3Cgy8o4AFi5O2nUu0Z2+S4cmItkUgvUrDxI4Z+lQNW2s9W FgQAtoQE6Vonn2ej3ti9W02IB51VNbQE9cJPZKd9+F2mDWII5fRdCX4OtyRgoT0gwLTk FVQOpQ6ZuknA47iqwXuONaYmVfZi3A1wDrJpF0EcAmWp5kr6kuQeoGHFqdHHG+TZS4xG mN07q6/vXf28DAu3uA4ixSS/0Wv7KsdKbwMMByr680WmOYa+SXI+LKgVYpHOTv2eSxz6 tnsKE7c3xslK8G4d1x+2N1RhuPrB40eszw8YF+8DIdt4BFH5DlKaBOZSn5Mcdhky48zE IUCg== X-Gm-Message-State: AHQUAuZczcBpYokAKy7wgFL4k/jFJkYiKm1avEAVYvFhDsfZiJz1yTuy c1nN30ZUnivCIpPqdrOcdn7XaQ== X-Google-Smtp-Source: AHgI3IbPwWduiW1J4vrLvcphVIO55eqQJU05gsl0DnsTN4nz63et34uFDX/3JN+ma7BatUQvVfrpsg== X-Received: by 2002:a2e:6a18:: with SMTP id f24mr13442639ljc.97.1551178832118; Tue, 26 Feb 2019 03:00:32 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk (borgediget.toke.dk. [85.204.121.218]) by smtp.gmail.com with ESMTPSA id v2sm2753886ljg.6.2019.02.26.03.00.31 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 26 Feb 2019 03:00:31 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 54AE0183BB9; Tue, 26 Feb 2019 12:00:30 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Jakub Kicinski Cc: Jesper Dangaard Brouer , David Miller , netdev@vger.kernel.org, Daniel Borkmann , Alexei Starovoitov Subject: Re: [PATCH net-next 1/2] xdp: Always use a devmap for XDP_REDIRECT to a device In-Reply-To: <20190225104757.75b622c9@cakuba.netronome.com> References: <155075021399.13610.12521373406832889226.stgit@alrua-x1> <20190221163627.7b8aa2ce@cakuba.netronome.com> <87va1cgmg1.fsf@toke.dk> <20190222133734.1880a88d@cakuba.netronome.com> <20190223114343.5813f18a@carbon> <875ztawvqh.fsf@toke.dk> <20190225104757.75b622c9@cakuba.netronome.com> X-Clacks-Overhead: GNU Terry Pratchett Date: Tue, 26 Feb 2019 12:00:30 +0100 Message-ID: <87ef7urf01.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8BIT Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Jakub Kicinski writes: > On Sat, 23 Feb 2019 13:11:02 +0100, Toke Høiland-Jørgensen wrote: >> Jesper Dangaard Brouer writes: >> > On Fri, 22 Feb 2019 13:37:34 -0800 Jakub Kicinski wrote: >> >> On Fri, 22 Feb 2019 11:13:50 +0100, Toke Høiland-Jørgensen wrote: >> >> > Jakub Kicinski writes: >> >> > > On Thu, 21 Feb 2019 12:56:54 +0100, Toke Høiland-Jørgensen wrote: >> > [...] >> >> > > >> >> > > BPF programs don't obey by netns boundaries. The fact the program is >> >> > > verified in one ns doesn't mean this is the only ns it will be used in :( >> >> > > Meaning if any program is using the redirect map you may need a secret >> >> > > map in every ns.. no? >> >> > >> >> > Ah, yes, good point. Totally didn't think about the fact that load and >> >> > attach are decoupled. Hmm, guess I'll just have to move the call to >> >> > alloc_default_map() to the point where the program is attached to an >> >> > interface, then... >> >> >> >> Possibly.. and you also need to handle the case where interface with a >> >> program attached is moved, no? >> >> Yup, alloc on attach was easy enough; the moving turns out to be the >> tricky part :) >> >> > True, we need to handle if e.g. a veth gets an XDP program attached and >> > then is moved into a network namespace (as I've already explained to >> > Toke in a meeting). >> >> Yeah, I had somehow convinced myself that the XDP program was being >> removed when the interface was being torn down before moving between >> namespaces. Jesper pointed out that this was not in fact the case... :P >> >> > I'm still not sure how to handle this... >> >> There are a couple of options, I think. At least: >> >> 1. Maintain a flag on struct net_device indicating that this device >> needs the redirect map allocated, and react to that when interfaces >> are being moved. >> >> 2. Lookup the BPF program by ID (which we can get from the driver) on >> move, and react to the program flag. >> >> 3. Keep the allocation on program load, but allocate maps for all active >> namespaces (which would probably need a refcnt mechanism to >> deallocate things again). >> >> I think I'm leaning towards #2; possibly combined with a refcnt so we >> can actually deallocate the map in the root namespace when it's not >> needed anymore. > > Okay.. what about tail calls? I think #3 is most reasonable > complexity- -wise, or some mix of #2 and #3 - cnt the programs with > legacy redirects, and then allocate the resources if cnt && name space > has any XDP program attached. Yeah, I have that more or less working; except I forgot about tail calls, but that should not be too difficult to fix. > Can users really not be told to just use the correct helper? ;) Experience would suggest not; users tend to use the simplest API that gets their job done. And then wonder why they don't get the nice performance numbers they were "promised". And, well, I tend to agree that it's not terribly friendly to just go "use this other more complicated API if you want proper performance". If we really mean that, then we should formally deprecate xdp_redirect() as an API, IMO :) -Toke