From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 091CDC43215 for ; Fri, 22 Nov 2019 11:55:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id CB35A2068E for ; Fri, 22 Nov 2019 11:55:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="GD0wUy/A" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726714AbfKVLzD (ORCPT ); Fri, 22 Nov 2019 06:55:03 -0500 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:34387 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727197AbfKVLzC (ORCPT ); Fri, 22 Nov 2019 06:55:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1574423700; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=DMDYnns1qVOSm1cc8Vao44i/3MIyZ9c5XNLvlT2x5vM=; b=GD0wUy/A8mmimzzS4Mu7O/2I4Bfojbu9nTFVcaC91xcY4DDcJkPUsHY3jQcgJL5i20qAZ8 qV62iLIagkoHKyA14OVyfg9Dp9Ofb+al7HmLOBH7NvBxcCWx0uRjNu7YhgL6+r4nfczrtd 2fz4ouM0MdzKxDX3EuC0iW6VZdS549o= Received: from mail-lf1-f71.google.com (mail-lf1-f71.google.com [209.85.167.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-274-e-tWanhbMwC0Aq-73s-u3A-1; Fri, 22 Nov 2019 06:54:57 -0500 Received: by mail-lf1-f71.google.com with SMTP id t28so1711812lfq.6 for ; Fri, 22 Nov 2019 03:54:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version:content-transfer-encoding; bh=o8pUXAifARpSrDKSBJaNt5ENQJECJNnZCKegH2kb4Hs=; b=rsDUJu6j2AqPU+tFWUkXTP9Fro82nojO5DeqwkULKVVJ7RRlnn2esJMXFcmAK6wYLn /81uWn2F3KTOAMXeYqpruieQuwIJ/92pnnYk19qKcn9H2D6oyExOZalZ5Lw16rAYLQa2 iP2exup3LZLS4QvCti/vMoKqWH6xMqN03iA5WEVEdAL7pOl+LB5fOIEmj984vLaerVHY v70UgmfOG16IX62I7e0T+FOq/BM0InP/4SPMgDUZ6hXs8DWFGc6UjGUXZUUGh6SpP/mU Tin/zolG8seqC5ni8UPm/obI1lnK7yIBDgrBs5RDJ4F3kM2DzuU/RrXkdCYUHkES6Wum OYhg== X-Gm-Message-State: APjAAAU2uEaXGv2mt8icJ71K02dZU2tazhMAXjyVQmCUbOcNAX4cpLAN RSQZX4M/HfJGmQV3qxSZlmLa4HPPLKGf1ruMzUMwZGEau820wIwGeQ1rw4uxpNdVZDOLF9JBal2 rsL7vp6jTYtnr X-Received: by 2002:a2e:9e45:: with SMTP id g5mr10920207ljk.58.1574423695701; Fri, 22 Nov 2019 03:54:55 -0800 (PST) X-Google-Smtp-Source: APXvYqwhrvuBI1rw6q9oenfA2KBU5izLk8TMS60wbl3/LPn2Rh+BzYFbs3SmixPagla0IwOnRzEFaA== X-Received: by 2002:a2e:9e45:: with SMTP id g5mr10920188ljk.58.1574423695476; Fri, 22 Nov 2019 03:54:55 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk (borgediget.toke.dk. [85.204.121.218]) by smtp.gmail.com with ESMTPSA id x12sm2978878lfq.52.2019.11.22.03.54.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Nov 2019 03:54:54 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id 9B7E51800B9; Fri, 22 Nov 2019 12:54:53 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Toshiaki Makita , John Fastabend , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , "David S. Miller" , Jakub Kicinski , Jesper Dangaard Brouer , Jamal Hadi Salim , Cong Wang , Jiri Pirko , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , Pravin B Shelar Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, William Tu , Stanislav Fomichev Subject: Re: [RFC PATCH v2 bpf-next 00/15] xdp_flow: Flow offload to XDP In-Reply-To: References: <20191018040748.30593-1-toshiaki.makita1@gmail.com> <5da9d8c125fd4_31cf2adc704105c456@john-XPS-13-9370.notmuch> <22e6652c-e635-4349-c863-255d6c1c548b@gmail.com> <5daf34614a4af_30ac2b1cb5d205bce4@john-XPS-13-9370.notmuch> <87h840oese.fsf@toke.dk> <5db128153c75_549d2affde7825b85e@john-XPS-13-9370.notmuch> <87sgniladm.fsf@toke.dk> <87zhhmrz7w.fsf@toke.dk> <87zhhhnmg8.fsf@toke.dk> <640418c3-54ba-cd62-304f-fd9f73f25a42@gmail.com> <87blthox30.fsf@toke.dk> <87lfsiocj5.fsf@toke.dk> <6e08f714-6284-6d0d-9cbe-711c64bf97aa@gmail.com> <87k17xcwoq.fsf@toke.dk> X-Clacks-Overhead: GNU Terry Pratchett Date: Fri, 22 Nov 2019 12:54:53 +0100 Message-ID: <8736eg5do2.fsf@toke.dk> MIME-Version: 1.0 X-MC-Unique: e-tWanhbMwC0Aq-73s-u3A-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Toshiaki Makita writes: > On 2019/11/18 19:20, Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> Toshiaki Makita writes: >>=20 >> [... trimming the context a bit ...] >>=20 >>>>>> Take your example of TC rules: You were proposing a flow like this: >>>>>> >>>>>> Userspace TC rule -> kernel rule table -> eBPF map -> generated XDP >>>>>> program >>>>>> >>>>>> Whereas what I mean is that we could do this instead: >>>>>> >>>>>> Userspace TC rule -> kernel rule table >>>>>> >>>>>> and separately >>>>>> >>>>>> XDP program -> bpf helper -> lookup in kernel rule table >>>>> >>>>> Thanks, now I see what you mean. >>>>> You expect an XDP program like this, right? >>>>> >>>>> int xdp_tc(struct xdp_md *ctx) >>>>> { >>>>> =09int act =3D bpf_xdp_tc_filter(ctx); >>>>> =09return act; >>>>> } >>>> >>>> Yes, basically, except that the XDP program would need to parse the >>>> packet first, and bpf_xdp_tc_filter() would take a parameter struct wi= th >>>> the parsed values. See the usage of bpf_fib_lookup() in >>>> bpf/samples/xdp_fwd_kern.c >>>> >>>>> But doesn't this way lose a chance to reduce/minimize the program to >>>>> only use necessary features for this device? >>>> >>>> Not necessarily. Since the BPF program does the packet parsing and fil= ls >>>> in the TC filter lookup data structure, it can limit what features are >>>> used that way (e.g., if I only want to do IPv6, I just parse the v6 >>>> header, ignore TCP/UDP, and drop everything that's not IPv6). The look= up >>>> helper could also have a flag argument to disable some of the lookup >>>> features. >>> >>> It's unclear to me how to configure that. >>> Use options when attaching the program? Something like >>> $ xdp_tc attach eth0 --only-with ipv6 >>> But can users always determine their necessary features in advance? >>=20 >> That's what I'm doing with xdp-filter now. But the answer to your second >> question is likely to be 'probably not', so it would be good to not have >> to do this :) >>=20 >>> Frequent manual reconfiguration when TC rules frequently changes does >>> not sound nice. Or, add hook to kernel to listen any TC filter event >>> on some daemon and automatically reload the attached program? >>=20 >> Doesn't have to be a kernel hook; we could enhance the userspace tooling >> to do it. Say we integrate it into 'tc': >>=20 >> - Add a new command 'tc xdp_accel enable --features [ipv6,etc]' >> - When adding new rules, add the following logic: >> - Check if XDP acceleration is enabled >> - If it is, check whether the rule being added fits into the current >> 'feature set' loaded on that interface. >> - If the rule needs more features, reload the XDP program to one >> with the needed additional features. >> - Or, alternatively, just warn the user and let them manually >> replace it? > > Ok, but there are other userspace tools to configure tc in wild. > python and golang have their own netlink library project. > OVS embeds TC netlink handling code in itself. There may be more tools li= ke this. > I think at least we should have rtnl notification about TC and monitor it > from daemon, if we want to reload the program from userspace tools. A daemon would be one way to do this in cases where it needs to be completely dynamic. My guess is that there are lots of environments where that is not required, and where a user/administrator could realistically specify ahead of time which feature set they want to enable XDP acceleration for. So in my mind the way to go about this is to implement the latter first, then add dynamic reconfiguration of it on top when (or if) it turns out to be necessary... -Toke