From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C62DFC4361B for ; Sun, 6 Dec 2020 18:46:19 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 067CB22C7D for ; Sun, 6 Dec 2020 18:46:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 067CB22C7D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:45282 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1klz33-0001ml-8K for qemu-devel@archiver.kernel.org; Sun, 06 Dec 2020 13:46:17 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:42622) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1klz17-0001Ag-3G for qemu-devel@nongnu.org; Sun, 06 Dec 2020 13:44:17 -0500 Received: from mail-ot1-x342.google.com ([2607:f8b0:4864:20::342]:41009) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1klz13-0006Tl-Uw for qemu-devel@nongnu.org; Sun, 06 Dec 2020 13:44:16 -0500 Received: by mail-ot1-x342.google.com with SMTP id x13so2993491oto.8 for ; Sun, 06 Dec 2020 10:44:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=v4yXZ0PFH9ck6ODfiNT17No4GpOTsWe7cpEuqaXIgxk=; b=BrDvfj/nbcdgziv/eAInVdF5te2dHDnNPwXyIwvpYJupnltVznlp9kX+aB3E71+ZZ3 Wx6AjTnbtd4hnYzkkoy3I73ZlF1hLz3jaFRJM+LKRtGAr0eKxPlEtwKWyyHyygMGdObQ Nn2fKqxdtrB9VSaJqt3YZ8yZrzHWnm008cSHTZJrJBdmTw0PrTQxXsHw1erRjAJGyEF6 z+4EBatmhqEVGGeGYwFltOkXypjlJ4A/kI5hZvdRBJa3B+2ll8HVUvqWh+1wGhKZtWMI 97GYctw0xG32iy64PgO4DsmmShyAIhy+wWdr/SvkPex09CleItvFvyEKzybfJzok8QNd QTag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=v4yXZ0PFH9ck6ODfiNT17No4GpOTsWe7cpEuqaXIgxk=; b=JRvOFJ6t5xiMm1ZK2i5lPiD2Cq5+YCkli8B+iHGKJOdwIv6Bycgc/SR3a/SykuyQAo 5woASgDSykpaljohwBtszMFO5X/36vt2aH+fCUDFZ2pnNZHeGRsWUqc+6X1M2yXgWytz mUloE1hOE2BPr1ga/mZfp2qxEBfzFbrgs1P4glGu+aMtVesLjoTGgt1XD1oIHDQLed3C 3IHU397SuZCBhAsQ30q8403ABiIdLV0x5Kn5cHJX6TXWqBxEixovlXfbehVOeNuX1b+v oH7Qq3ANm7BJK7wQZpUm8eznetlg4oewNadi5Db5syOUmq+OCkbkjrxiXbOGDJPQ/gzR aYYQ== X-Gm-Message-State: AOAM531vZukz9FQDRueBWqIrLmWltL7vGglQROEDgqSvHmcG/mst9+vW b5K2TiMAjemdHFAjmSV4gkeHNbiMVqRdJLaZShVLGA== X-Google-Smtp-Source: ABdhPJxDnIBTS4lF94xFkw6tlVq19rB0uQr7i+wd0CZeP12P6dex/F/KZd8o9HD96L3M0H8MDGxVmbyyYqUbwPZdyxs= X-Received: by 2002:a9d:4715:: with SMTP id a21mr10821791otf.220.1607280252329; Sun, 06 Dec 2020 10:44:12 -0800 (PST) MIME-Version: 1.0 References: <20201119111305.485202-1-andrew@daynix.com> <00e5b0a8-dfaa-2899-2501-cfe8249302ff@redhat.com> <87h7p4cmva.fsf@toke.dk> <87im9h9933.fsf@toke.dk> <87v9dh7jy5.fsf@toke.dk> In-Reply-To: <87v9dh7jy5.fsf@toke.dk> From: Yuri Benditovich Date: Sun, 6 Dec 2020 20:44:01 +0200 Message-ID: Subject: Re: [RFC PATCH v2 0/5] eBPF RSS support for virtio-net To: =?UTF-8?B?VG9rZSBIw7hpbGFuZC1Kw7hyZ2Vuc2Vu?= Content-Type: multipart/alternative; boundary="0000000000007eb85d05b5d01877" Received-SPF: none client-ip=2607:f8b0:4864:20::342; envelope-from=yuri.benditovich@daynix.com; helo=mail-ot1-x342.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, Yan Vugenfirer , Jason Wang , Andrew Melnychenko , "Michael S . Tsirkin" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --0000000000007eb85d05b5d01877 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Dec 4, 2020 at 3:57 PM Toke H=C3=B8iland-J=C3=B8rgensen wrote: > Yuri Benditovich writes: > > > On Fri, Dec 4, 2020 at 12:09 PM Toke H=C3=B8iland-J=C3=B8rgensen > > wrote: > > > >> Yuri Benditovich writes: > >> > >> > On Wed, Dec 2, 2020 at 4:18 PM Toke H=C3=B8iland-J=C3=B8rgensen < > toke@redhat.com> > >> > wrote: > >> > > >> >> Jason Wang writes: > >> >> > >> >> > On 2020/11/19 =E4=B8=8B=E5=8D=887:13, Andrew Melnychenko wrote: > >> >> >> This set of patches introduces the usage of eBPF for packet > steering > >> >> >> and RSS hash calculation: > >> >> >> * RSS(Receive Side Scaling) is used to distribute network packet= s > to > >> >> >> guest virtqueues by calculating packet hash > >> >> >> * Additionally adding support for the usage of RSS with vhost > >> >> >> > >> >> >> The eBPF works on kernels 5.8+ > >> >> >> On earlier kerneld it fails to load and the RSS feature is > reported > >> >> >> only without vhost and implemented in 'in-qemu' software. > >> >> >> > >> >> >> Implementation notes: > >> >> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF > program. > >> >> >> Added libbpf dependency and eBPF support. > >> >> >> The eBPF program is part of the qemu and presented as an array > >> >> >> of BPF ELF file data. > >> >> >> The compilation of eBPF is not part of QEMU build and can be don= e > >> >> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs'). > >> >> >> Added changes to virtio-net and vhost, primary eBPF RSS is used. > >> >> >> 'in-qemu' RSS used in the case of hash population and as a > fallback > >> >> option. > >> >> >> For vhost, the hash population feature is not reported to the > guest. > >> >> >> > >> >> >> Please also see the documentation in PATCH 5/5. > >> >> >> > >> >> >> I am sending those patches as RFC to initiate the discussions an= d > get > >> >> >> feedback on the following points: > >> >> >> * Fallback when eBPF is not supported by the kernel > >> >> >> * Live migration to the kernel that doesn't have eBPF support > >> >> >> * Integration with current QEMU build > >> >> >> * Additional usage for eBPF for packet filtering > >> >> >> > >> >> >> Known issues: > >> >> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used > >> >> >> as a fallback, also, hash population feature is not reported to > >> guests > >> >> >> with vhost. > >> >> >> * big-endian BPF support: for now, eBPF isn't supported on > >> >> >> big-endian systems. Can be added in future if required. > >> >> >> * huge .h file with eBPF binary. The size of .h file containing > >> >> >> eBPF binary is currently ~5K lines, because the binary is built > with > >> >> debug information. > >> >> >> The binary without debug/BTF info can't be loaded by libbpf. > >> >> >> We're looking for possibilities to reduce the size of the .h > files. > >> >> > > >> >> > > >> >> > Adding Toke for sharing more idea from eBPF side. > >> >> > > >> >> > We had some discussion on the eBPF issues: > >> >> > > >> >> > 1) Whether or not to use libbpf. Toke strongly suggest to use > libbpf > >> >> > 2) Whether or not to use BTF. Toke confirmed that if we don't > access > >> any > >> >> > skb metadata, BTF is not strictly required for CO-RE. But it migh= t > >> still > >> >> > useful for e.g debugging. > >> >> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed > that > >> we > >> >> > can strip debug symbols, but Yuri found some sections can't be > >> stripped, > >> >> > we can keep discussing here. > >> >> > >> >> I just tried simply running 'strip' on a sample trivial XDP program= , > >> >> which brought its size down from ~5k to ~1k and preserved the BTF > >> >> information without me having to do anything. > >> >> > >> > > >> > With our eBPF code the numbers are slightly different: > >> > The code size without BTF: 7.5K (built without '-g') > >> > Built with '-g': 45K > >> > Stripped: 19K > >> > The difference between 7.5 and 19K still seems significant, especial= ly > >> when > >> > we do not use any kernel structures and do not need these BTF sectio= ns > >> > >> That does seem like a lot of BTF information. Did you confirm (with > >> objdump) that it's the .BTF* sections that take up these extra 12k? Do > >> you have some really complicated data structures in the file or > >> something? Got a link to the source somewhere that isn't a web mailing > >> list archive? :) > >> > >> > > Looks like the extra size is related to BTF: there are 4 BTF sections > that > > take 12.5K > > [ 7] .BTF PROGBITS 0000000000000000 00144c 00175d > 00 > > 0 0 1 > > [ 8] .rel.BTF REL 0000000000000000 002bb0 000040 > 10 > > 14 7 8 > > [ 9] .BTF.ext PROGBITS 0000000000000000 002bf0 000cd0 > 00 > > 0 0 1 > > [10] .rel.BTF.ext REL 0000000000000000 0038c0 000ca0 > 10 > > 14 9 8 > > Right, okay, that does not look completely outrageous with the amount of > code and type information you have in that file. > > > All the sources are at: > > The branch without libbpf > > https://github.com/daynix/qemu/tree/eBPF_RFC > > The branch with libbpf > > https://github.com/daynix/qemu/tree/eBPF_RFCv2 > > > > all the eBPF-related code is under qemu/ebpf directory. > > Ah, cool, thanks! > > >> In any case, while I do think it smells a little of premature > >> optimisation, you can of course strip the BTF information until you ne= ed > >> it. Having it around makes debugging easier (bpftool will expand your > >> map structures for you when dumping maps, and that sort of thing), but > >> it's not really essential if you don't need CO-RE. > >> > >> > This is only reason to prefer non-libbpf option for this specific eB= PF > >> > >> You can still use libbpf without BTF. It's using BTF without libbpf th= at > >> tends to not work so well... > >> > >> > > If we build the eBPF without '-g' or strip the BTF information out of t= he > > object file the libbpf crashes right after issuing printout "libbpf: BT= F > is > > required, but is missing or corrupted". > > We did not investigate this too deeply but on the first glance it looks > > like the presence of maps automatically makes the libbpf to require BTF= . > > Ah, right. Well, you're using the BTF-based map definition syntax. So > yeah, that does require BTF: The __uint() and __type() macros really > expand to type definitions that are specifically crafted to be embedded > as BTF in the file. > Yes, now the EBPF built without '-g' also can be loaded via libbpf and we can enable/disable BTF as we need. Again, thank you very much! > > You could use the old-style map definitions that don't use BTF[0], but > BTF is really where things are going in BPF-land so I think longer term > you'll probably end up needing it anyway. So going to this much trouble > just to save 10k on binary size seems to me like it's a decision you'll > end up regretting :) > > [0] > https://github.com/xdp-project/xdp-tutorial/blob/master/basic03-map-count= er/xdp_prog_kern.c#L11 > > -Toke > > --0000000000007eb85d05b5d01877 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Fri, Dec 4, 2020 at 3:57 PM Toke H= =C3=B8iland-J=C3=B8rgensen <toke@redh= at.com> wrote:
Yuri Benditovich <yuri.benditovich@daynix.com> writes:

> On Fri, Dec 4, 2020 at 12:09 PM Toke H=C3=B8iland-J=C3=B8rgensen <<= a href=3D"mailto:toke@redhat.com" target=3D"_blank">toke@redhat.com>=
> wrote:
>
>> Yuri Benditovich <yuri.benditovich@daynix.com> writes:
>>
>> > On Wed, Dec 2, 2020 at 4:18 PM Toke H=C3=B8iland-J=C3=B8rgens= en <toke@redhat.com= >
>> > wrote:
>> >
>> >> Jason Wang <jasowang@redhat.com> writes:
>> >>
>> >> > On 2020/11/19 =E4=B8=8B=E5=8D=887:13, Andrew Melnych= enko wrote:
>> >> >> This set of patches introduces the usage of eBPF= for packet steering
>> >> >> and RSS hash calculation:
>> >> >> * RSS(Receive Side Scaling) is used to distribut= e network packets to
>> >> >> guest virtqueues by calculating packet hash
>> >> >> * Additionally adding support for the usage of R= SS with vhost
>> >> >>
>> >> >> The eBPF works on kernels 5.8+
>> >> >> On earlier kerneld it fails to load and the RSS = feature is reported
>> >> >> only without vhost and implemented in 'in-qe= mu' software.
>> >> >>
>> >> >> Implementation notes:
>> >> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to s= et the eBPF program.
>> >> >> Added libbpf dependency and eBPF support.
>> >> >> The eBPF program is part of the qemu and present= ed as an array
>> >> >> of BPF ELF file data.
>> >> >> The compilation of eBPF is not part of QEMU buil= d and can be done
>> >> >> using provided Makefile.ebpf(need to adjust '= ;linuxhdrs').
>> >> >> Added changes to virtio-net and vhost, primary e= BPF RSS is used.
>> >> >> 'in-qemu' RSS used in the case of hash p= opulation and as a fallback
>> >> option.
>> >> >> For vhost, the hash population feature is not re= ported to the guest.
>> >> >>
>> >> >> Please also see the documentation in PATCH 5/5.<= br> >> >> >>
>> >> >> I am sending those patches as RFC to initiate th= e discussions and get
>> >> >> feedback on the following points:
>> >> >> * Fallback when eBPF is not supported by the ker= nel
>> >> >> * Live migration to the kernel that doesn't = have eBPF support
>> >> >> * Integration with current QEMU build
>> >> >> * Additional usage for eBPF for packet filtering=
>> >> >>
>> >> >> Known issues:
>> >> >> * hash population not supported by eBPF RSS: = 9;in-qemu' RSS used
>> >> >> as a fallback, also, hash population feature is = not reported to
>> guests
>> >> >> with vhost.
>> >> >> * big-endian BPF support: for now, eBPF isn'= t supported on
>> >> >> big-endian systems. Can be added in future if re= quired.
>> >> >> * huge .h file with eBPF binary. The size of .h = file containing
>> >> >> eBPF binary is currently ~5K lines, because the = binary is built with
>> >> debug information.
>> >> >> The binary without debug/BTF info can't be l= oaded by libbpf.
>> >> >> We're looking for possibilities to reduce th= e size of the .h files.
>> >> >
>> >> >
>> >> > Adding Toke for sharing more idea from eBPF side. >> >> >
>> >> > We had some discussion on the eBPF issues:
>> >> >
>> >> > 1) Whether or not to use libbpf. Toke strongly sugge= st to use libbpf
>> >> > 2) Whether or not to use BTF. Toke confirmed that if= we don't access
>> any
>> >> > skb metadata, BTF is not strictly required for CO-RE= . But it might
>> still
>> >> > useful for e.g debugging.
>> >> > 3) About the huge (5K lines, see patch #2 Toke). Tok= e confirmed that
>> we
>> >> > can strip debug symbols, but Yuri found some section= s can't be
>> stripped,
>> >> > we can keep discussing here.
>> >>
>> >> I just tried simply running 'strip' on a sample t= rivial XDP program,
>> >> which brought its size down from ~5k to ~1k and preserved= the BTF
>> >> information without me having to do anything.
>> >>
>> >
>> > With our eBPF code the numbers are slightly different:
>> > The code size without BTF: 7.5K (built without '-g')<= br> >> > Built with '-g': 45K
>> > Stripped: 19K
>> > The difference between 7.5 and 19K still seems significant, e= specially
>> when
>> > we do not use any kernel structures and do not need these BTF= sections
>>
>> That does seem like a lot of BTF information. Did you confirm (wit= h
>> objdump) that it's the .BTF* sections that take up these extra= 12k? Do
>> you have some really complicated data structures in the file or >> something? Got a link to the source somewhere that isn't a web= mailing
>> list archive? :)
>>
>>
> Looks like the extra size is related to BTF: there are 4 BTF sections = that
> take 12.5K
>=C2=A0 =C2=A0[ 7] .BTF=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = PROGBITS=C2=A0 =C2=A0 =C2=A0 =C2=A0 0000000000000000 00144c 00175d 00
>=C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A00=C2=A0 1
>=C2=A0 =C2=A0[ 8] .rel.BTF=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 REL=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00000000000000000 002bb0 000040 10<= br> >=C2=A0 =C2=A0 =C2=A014=C2=A0 =C2=A07=C2=A0 8
>=C2=A0 =C2=A0[ 9] .BTF.ext=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 PROGBITS= =C2=A0 =C2=A0 =C2=A0 =C2=A0 0000000000000000 002bf0 000cd0 00
>=C2=A0 =C2=A0 =C2=A0 0=C2=A0 =C2=A00=C2=A0 1
>=C2=A0 =C2=A0[10] .rel.BTF.ext=C2=A0 =C2=A0 =C2=A0 REL=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A00000000000000000 0038c0 000ca0 10
>=C2=A0 =C2=A0 =C2=A014=C2=A0 =C2=A09=C2=A0 8

Right, okay, that does not look completely outrageous with the amount of code and type information you have in that file.

> All the sources are at:
> The branch without libbpf
> https://github.com/daynix/qemu/tree/eBPF_RFC > The branch with libbpf
> https://github.com/daynix/qemu/tree/eBPF_RFCv2
>
> all the eBPF-related code is under qemu/ebpf directory.

Ah, cool, thanks!

>> In any case, while I do think it smells a little of premature
>> optimisation, you can of course strip the BTF information until yo= u need
>> it. Having it around makes debugging easier (bpftool will expand y= our
>> map structures for you when dumping maps, and that sort of thing),= but
>> it's not really essential if you don't need CO-RE.
>>
>> > This is only reason to prefer non-libbpf option for this spec= ific eBPF
>>
>> You can still use libbpf without BTF. It's using BTF without l= ibbpf that
>> tends to not work so well...
>>
>>
> If we build the eBPF without '-g' or strip the BTF information= out of the
> object file the libbpf crashes right after issuing printout "libb= pf: BTF is
> required, but is missing or corrupted".
> We did not investigate this too deeply but on the first glance it look= s
> like the presence of maps automatically makes the libbpf to require BT= F.

Ah, right. Well, you're using the BTF-based map definition syntax. So yeah, that does require BTF: The __uint() and __type() macros really
expand to type definitions that are specifically crafted to be embedded
as BTF in the file.



You could use the old-style map definitions that don't use BTF[0], but<= br> BTF is really where things are going in BPF-land so I think longer term
you'll probably end up needing it anyway. So going to this much trouble=
just to save 10k on binary size seems to me like it's a decision you= 9;ll
end up regretting :)

[0]
h= ttps://github.com/xdp-project/xdp-tutorial/blob/master/basic03-map-counter/= xdp_prog_kern.c#L11

-Toke

--0000000000007eb85d05b5d01877--