From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B33C8C46460 for ; Tue, 14 Aug 2018 10:17:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6883E21733 for ; Tue, 14 Aug 2018 10:17:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6883E21733 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732186AbeHNNEO convert rfc822-to-8bit (ORCPT ); Tue, 14 Aug 2018 09:04:14 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:34084 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727858AbeHNNEO (ORCPT ); Tue, 14 Aug 2018 09:04:14 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 999A6B5C6; Tue, 14 Aug 2018 10:17:42 +0000 (UTC) Received: from localhost (ovpn-200-47.brq.redhat.com [10.40.200.47]) by smtp.corp.redhat.com (Postfix) with ESMTP id C8A3E2156712; Tue, 14 Aug 2018 10:17:39 +0000 (UTC) Date: Tue, 14 Aug 2018 12:17:34 +0200 From: Jesper Dangaard Brouer To: Jason Wang Cc: Alexei Starovoitov , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, mst@redhat.com Subject: Re: [RFC PATCH net-next V2 0/6] XDP rx handler Message-ID: <20180814121734.105769fa@redhat.com> In-Reply-To: <5de3d14f-f21a-c806-51f4-b5efd7d809b7@redhat.com> References: <1534130250-5302-1-git-send-email-jasowang@redhat.com> <20180814003253.fkgl6lyklc7fclvq@ast-mbp> <5de3d14f-f21a-c806-51f4-b5efd7d809b7@redhat.com> Organization: Red Hat Inc. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 14 Aug 2018 10:17:42 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 14 Aug 2018 10:17:42 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jbrouer@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 14 Aug 2018 15:59:01 +0800 Jason Wang wrote: > On 2018年08月14日 08:32, Alexei Starovoitov wrote: > > On Mon, Aug 13, 2018 at 11:17:24AM +0800, Jason Wang wrote: > >> Hi: > >> > >> This series tries to implement XDP support for rx hanlder. This would > >> be useful for doing native XDP on stacked device like macvlan, bridge > >> or even bond. > >> > >> The idea is simple, let stacked device register a XDP rx handler. And > >> when driver return XDP_PASS, it will call a new helper xdp_do_pass() > >> which will try to pass XDP buff to XDP rx handler directly. XDP rx > >> handler may then decide how to proceed, it could consume the buff, ask > >> driver to drop the packet or ask the driver to fallback to normal skb > >> path. > >> > >> A sample XDP rx handler was implemented for macvlan. And virtio-net > >> (mergeable buffer case) was converted to call xdp_do_pass() as an > >> example. For ease comparision, generic XDP support for rx handler was > >> also implemented. > >> > >> Compared to skb mode XDP on macvlan, native XDP on macvlan (XDP_DROP) > >> shows about 83% improvement. > > I'm missing the motiviation for this. > > It seems performance of such solution is ~1M packet per second. > > Notice it was measured by virtio-net which is kind of slow. > > > What would be a real life use case for such feature ? > > I had another run on top of 10G mlx4 and macvlan: > > XDP_DROP on mlx4: 14.0Mpps > XDP_DROP on macvlan: 10.05Mpps > > Perf shows macvlan_hash_lookup() and indirect call to > macvlan_handle_xdp() are the reasons for the number drop. I think the > numbers are acceptable. And we could try more optimizations on top. > > So here's real life use case is trying to have an fast XDP path for rx > handler based device: > > - For containers, we can run XDP for macvlan (~70% of wire speed). This > allows a container specific policy. > - For VM, we can implement macvtap XDP rx handler on top. This allow us > to forward packet to VM without building skb in the setup of macvtap. > - The idea could be used by other rx handler based device like bridge, > we may have a XDP fast forwarding path for bridge. > > > > > Another concern is that XDP users expect to get line rate performance > > and native XDP delivers it. 'generic XDP' is a fallback only > > mechanism to operate on NICs that don't have native XDP yet. > > So I can replace generic XDP TX routine with a native one for macvlan. If you simply implement ndo_xdp_xmit() for macvlan, and instead use XDP_REDIRECT, then we are basically done. > > Toshiaki's veth XDP work fits XDP philosophy and allows > > high speed networking to be done inside containers after veth. > > It's trying to get to line rate inside container. > > This is one of the goal of this series as well. I agree veth XDP work > looks pretty fine, but it only work for a specific setup I believe since > it depends on XDP_REDIRECT which is supported by few drivers (and > there's no VF driver support). The XDP_REDIRECT (RX-side) is trivial to add to drivers. It is a bad argument that only a few drivers implement this. Especially since all drivers also need to be extended with your proposed xdp_do_pass() call. (rant) The thing that is delaying XDP_REDIRECT adaption in drivers, is that it is harder to implement the TX-side, as the ndo_xdp_xmit() call have to allocate HW TX-queue resources. If we disconnect RX and TX side of redirect, then we can implement RX-side in an afternoon. > And in order to make it work for a end > user, the XDP program still need logic like hash(map) lookup to > determine the destination veth. That _is_ the general idea behind XDP and eBPF, that we need to add logic that determine the destination. The kernel provides the basic mechanisms for moving/redirecting packets fast, and someone else builds an orchestration tool like Cilium, that adds the needed logic. Did you notice that we (Ahern) added bpf_fib_lookup a FIB route lookup accessible from XDP. For macvlan, I imagine that we could add a BPF helper that allows you to lookup/call macvlan_hash_lookup(). > > This XDP rx handler stuff is destined to stay at 1Mpps speeds forever > > and the users will get confused with forever slow modes of XDP. > > > > Please explain the problem you're trying to solve. > > "look, here I can to XDP on top of macvlan" is not an explanation of the problem. > > -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer