On Fri, Dec 4, 2020 at 3:57 PM Toke Høiland-Jørgensen <toke@redhat.com>
wrote:

> Yuri Benditovich <yuri.benditovich@daynix.com> writes:
>
> > On Fri, Dec 4, 2020 at 12:09 PM Toke Høiland-Jørgensen <toke@redhat.com>
> > wrote:
> >
> >> Yuri Benditovich <yuri.benditovich@daynix.com> writes:
> >>
> >> > On Wed, Dec 2, 2020 at 4:18 PM Toke Høiland-Jørgensen <
> toke@redhat.com>
> >> > wrote:
> >> >
> >> >> Jason Wang <jasowang@redhat.com> writes:
> >> >>
> >> >> > On 2020/11/19 下午7:13, Andrew Melnychenko wrote:
> >> >> >> This set of patches introduces the usage of eBPF for packet
> steering
> >> >> >> and RSS hash calculation:
> >> >> >> * RSS(Receive Side Scaling) is used to distribute network packets
> to
> >> >> >> guest virtqueues by calculating packet hash
> >> >> >> * Additionally adding support for the usage of RSS with vhost
> >> >> >>
> >> >> >> The eBPF works on kernels 5.8+
> >> >> >> On earlier kerneld it fails to load and the RSS feature is
> reported
> >> >> >> only without vhost and implemented in 'in-qemu' software.
> >> >> >>
> >> >> >> Implementation notes:
> >> >> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF
> program.
> >> >> >> Added libbpf dependency and eBPF support.
> >> >> >> The eBPF program is part of the qemu and presented as an array
> >> >> >> of BPF ELF file data.
> >> >> >> The compilation of eBPF is not part of QEMU build and can be done
> >> >> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs').
> >> >> >> Added changes to virtio-net and vhost, primary eBPF RSS is used.
> >> >> >> 'in-qemu' RSS used in the case of hash population and as a
> fallback
> >> >> option.
> >> >> >> For vhost, the hash population feature is not reported to the
> guest.
> >> >> >>
> >> >> >> Please also see the documentation in PATCH 5/5.
> >> >> >>
> >> >> >> I am sending those patches as RFC to initiate the discussions and
> get
> >> >> >> feedback on the following points:
> >> >> >> * Fallback when eBPF is not supported by the kernel
> >> >> >> * Live migration to the kernel that doesn't have eBPF support
> >> >> >> * Integration with current QEMU build
> >> >> >> * Additional usage for eBPF for packet filtering
> >> >> >>
> >> >> >> Known issues:
> >> >> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used
> >> >> >> as a fallback, also, hash population feature is not reported to
> >> guests
> >> >> >> with vhost.
> >> >> >> * big-endian BPF support: for now, eBPF isn't supported on
> >> >> >> big-endian systems. Can be added in future if required.
> >> >> >> * huge .h file with eBPF binary. The size of .h file containing
> >> >> >> eBPF binary is currently ~5K lines, because the binary is built
> with
> >> >> debug information.
> >> >> >> The binary without debug/BTF info can't be loaded by libbpf.
> >> >> >> We're looking for possibilities to reduce the size of the .h
> files.
> >> >> >
> >> >> >
> >> >> > Adding Toke for sharing more idea from eBPF side.
> >> >> >
> >> >> > We had some discussion on the eBPF issues:
> >> >> >
> >> >> > 1) Whether or not to use libbpf. Toke strongly suggest to use
> libbpf
> >> >> > 2) Whether or not to use BTF. Toke confirmed that if we don't
> access
> >> any
> >> >> > skb metadata, BTF is not strictly required for CO-RE. But it might
> >> still
> >> >> > useful for e.g debugging.
> >> >> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed
> that
> >> we
> >> >> > can strip debug symbols, but Yuri found some sections can't be
> >> stripped,
> >> >> > we can keep discussing here.
> >> >>
> >> >> I just tried simply running 'strip' on a sample trivial XDP program,
> >> >> which brought its size down from ~5k to ~1k and preserved the BTF
> >> >> information without me having to do anything.
> >> >>
> >> >
> >> > With our eBPF code the numbers are slightly different:
> >> > The code size without BTF: 7.5K (built without '-g')
> >> > Built with '-g': 45K
> >> > Stripped: 19K
> >> > The difference between 7.5 and 19K still seems significant, especially
> >> when
> >> > we do not use any kernel structures and do not need these BTF sections
> >>
> >> That does seem like a lot of BTF information. Did you confirm (with
> >> objdump) that it's the .BTF* sections that take up these extra 12k? Do
> >> you have some really complicated data structures in the file or
> >> something? Got a link to the source somewhere that isn't a web mailing
> >> list archive? :)
> >>
> >>
> > Looks like the extra size is related to BTF: there are 4 BTF sections
> that
> > take 12.5K
> >   [ 7] .BTF              PROGBITS        0000000000000000 00144c 00175d
> 00
> >      0   0  1
> >   [ 8] .rel.BTF          REL             0000000000000000 002bb0 000040
> 10
> >     14   7  8
> >   [ 9] .BTF.ext          PROGBITS        0000000000000000 002bf0 000cd0
> 00
> >      0   0  1
> >   [10] .rel.BTF.ext      REL             0000000000000000 0038c0 000ca0
> 10
> >     14   9  8
>
> Right, okay, that does not look completely outrageous with the amount of
> code and type information you have in that file.
>
> > All the sources are at:
> > The branch without libbpf
> > https://github.com/daynix/qemu/tree/eBPF_RFC
> > The branch with libbpf
> > https://github.com/daynix/qemu/tree/eBPF_RFCv2
> >
> > all the eBPF-related code is under qemu/ebpf directory.
>
> Ah, cool, thanks!
>
> >> In any case, while I do think it smells a little of premature
> >> optimisation, you can of course strip the BTF information until you need
> >> it. Having it around makes debugging easier (bpftool will expand your
> >> map structures for you when dumping maps, and that sort of thing), but
> >> it's not really essential if you don't need CO-RE.
> >>
> >> > This is only reason to prefer non-libbpf option for this specific eBPF
> >>
> >> You can still use libbpf without BTF. It's using BTF without libbpf that
> >> tends to not work so well...
> >>
> >>
> > If we build the eBPF without '-g' or strip the BTF information out of the
> > object file the libbpf crashes right after issuing printout "libbpf: BTF
> is
> > required, but is missing or corrupted".
> > We did not investigate this too deeply but on the first glance it looks
> > like the presence of maps automatically makes the libbpf to require BTF.
>
> Ah, right. Well, you're using the BTF-based map definition syntax. So
> yeah, that does require BTF: The __uint() and __type() macros really
> expand to type definitions that are specifically crafted to be embedded
> as BTF in the file.
>

Yes, now the EBPF built without '-g' also can be loaded via libbpf and we
can enable/disable BTF as we need.
Again, thank you very much!



>
> You could use the old-style map definitions that don't use BTF[0], but
> BTF is really where things are going in BPF-land so I think longer term
> you'll probably end up needing it anyway. So going to this much trouble
> just to save 10k on binary size seems to me like it's a decision you'll
> end up regretting :)
>
> [0]
> https://github.com/xdp-project/xdp-tutorial/blob/master/basic03-map-counter/xdp_prog_kern.c#L11
>
> -Toke
>
>