On Fri, Dec 4, 2020 at 3:57 PM Toke Høiland-Jørgensen wrote: > Yuri Benditovich writes: > > > On Fri, Dec 4, 2020 at 12:09 PM Toke Høiland-Jørgensen > > wrote: > > > >> Yuri Benditovich writes: > >> > >> > On Wed, Dec 2, 2020 at 4:18 PM Toke Høiland-Jørgensen < > toke@redhat.com> > >> > wrote: > >> > > >> >> Jason Wang writes: > >> >> > >> >> > On 2020/11/19 下午7:13, Andrew Melnychenko wrote: > >> >> >> This set of patches introduces the usage of eBPF for packet > steering > >> >> >> and RSS hash calculation: > >> >> >> * RSS(Receive Side Scaling) is used to distribute network packets > to > >> >> >> guest virtqueues by calculating packet hash > >> >> >> * Additionally adding support for the usage of RSS with vhost > >> >> >> > >> >> >> The eBPF works on kernels 5.8+ > >> >> >> On earlier kerneld it fails to load and the RSS feature is > reported > >> >> >> only without vhost and implemented in 'in-qemu' software. > >> >> >> > >> >> >> Implementation notes: > >> >> >> Linux TAP TUNSETSTEERINGEBPF ioctl was used to set the eBPF > program. > >> >> >> Added libbpf dependency and eBPF support. > >> >> >> The eBPF program is part of the qemu and presented as an array > >> >> >> of BPF ELF file data. > >> >> >> The compilation of eBPF is not part of QEMU build and can be done > >> >> >> using provided Makefile.ebpf(need to adjust 'linuxhdrs'). > >> >> >> Added changes to virtio-net and vhost, primary eBPF RSS is used. > >> >> >> 'in-qemu' RSS used in the case of hash population and as a > fallback > >> >> option. > >> >> >> For vhost, the hash population feature is not reported to the > guest. > >> >> >> > >> >> >> Please also see the documentation in PATCH 5/5. > >> >> >> > >> >> >> I am sending those patches as RFC to initiate the discussions and > get > >> >> >> feedback on the following points: > >> >> >> * Fallback when eBPF is not supported by the kernel > >> >> >> * Live migration to the kernel that doesn't have eBPF support > >> >> >> * Integration with current QEMU build > >> >> >> * Additional usage for eBPF for packet filtering > >> >> >> > >> >> >> Known issues: > >> >> >> * hash population not supported by eBPF RSS: 'in-qemu' RSS used > >> >> >> as a fallback, also, hash population feature is not reported to > >> guests > >> >> >> with vhost. > >> >> >> * big-endian BPF support: for now, eBPF isn't supported on > >> >> >> big-endian systems. Can be added in future if required. > >> >> >> * huge .h file with eBPF binary. The size of .h file containing > >> >> >> eBPF binary is currently ~5K lines, because the binary is built > with > >> >> debug information. > >> >> >> The binary without debug/BTF info can't be loaded by libbpf. > >> >> >> We're looking for possibilities to reduce the size of the .h > files. > >> >> > > >> >> > > >> >> > Adding Toke for sharing more idea from eBPF side. > >> >> > > >> >> > We had some discussion on the eBPF issues: > >> >> > > >> >> > 1) Whether or not to use libbpf. Toke strongly suggest to use > libbpf > >> >> > 2) Whether or not to use BTF. Toke confirmed that if we don't > access > >> any > >> >> > skb metadata, BTF is not strictly required for CO-RE. But it might > >> still > >> >> > useful for e.g debugging. > >> >> > 3) About the huge (5K lines, see patch #2 Toke). Toke confirmed > that > >> we > >> >> > can strip debug symbols, but Yuri found some sections can't be > >> stripped, > >> >> > we can keep discussing here. > >> >> > >> >> I just tried simply running 'strip' on a sample trivial XDP program, > >> >> which brought its size down from ~5k to ~1k and preserved the BTF > >> >> information without me having to do anything. > >> >> > >> > > >> > With our eBPF code the numbers are slightly different: > >> > The code size without BTF: 7.5K (built without '-g') > >> > Built with '-g': 45K > >> > Stripped: 19K > >> > The difference between 7.5 and 19K still seems significant, especially > >> when > >> > we do not use any kernel structures and do not need these BTF sections > >> > >> That does seem like a lot of BTF information. Did you confirm (with > >> objdump) that it's the .BTF* sections that take up these extra 12k? Do > >> you have some really complicated data structures in the file or > >> something? Got a link to the source somewhere that isn't a web mailing > >> list archive? :) > >> > >> > > Looks like the extra size is related to BTF: there are 4 BTF sections > that > > take 12.5K > > [ 7] .BTF PROGBITS 0000000000000000 00144c 00175d > 00 > > 0 0 1 > > [ 8] .rel.BTF REL 0000000000000000 002bb0 000040 > 10 > > 14 7 8 > > [ 9] .BTF.ext PROGBITS 0000000000000000 002bf0 000cd0 > 00 > > 0 0 1 > > [10] .rel.BTF.ext REL 0000000000000000 0038c0 000ca0 > 10 > > 14 9 8 > > Right, okay, that does not look completely outrageous with the amount of > code and type information you have in that file. > > > All the sources are at: > > The branch without libbpf > > https://github.com/daynix/qemu/tree/eBPF_RFC > > The branch with libbpf > > https://github.com/daynix/qemu/tree/eBPF_RFCv2 > > > > all the eBPF-related code is under qemu/ebpf directory. > > Ah, cool, thanks! > > >> In any case, while I do think it smells a little of premature > >> optimisation, you can of course strip the BTF information until you need > >> it. Having it around makes debugging easier (bpftool will expand your > >> map structures for you when dumping maps, and that sort of thing), but > >> it's not really essential if you don't need CO-RE. > >> > >> > This is only reason to prefer non-libbpf option for this specific eBPF > >> > >> You can still use libbpf without BTF. It's using BTF without libbpf that > >> tends to not work so well... > >> > >> > > If we build the eBPF without '-g' or strip the BTF information out of the > > object file the libbpf crashes right after issuing printout "libbpf: BTF > is > > required, but is missing or corrupted". > > We did not investigate this too deeply but on the first glance it looks > > like the presence of maps automatically makes the libbpf to require BTF. > > Ah, right. Well, you're using the BTF-based map definition syntax. So > yeah, that does require BTF: The __uint() and __type() macros really > expand to type definitions that are specifically crafted to be embedded > as BTF in the file. > Yes, now the EBPF built without '-g' also can be loaded via libbpf and we can enable/disable BTF as we need. Again, thank you very much! > > You could use the old-style map definitions that don't use BTF[0], but > BTF is really where things are going in BPF-land so I think longer term > you'll probably end up needing it anyway. So going to this much trouble > just to save 10k on binary size seems to me like it's a decision you'll > end up regretting :) > > [0] > https://github.com/xdp-project/xdp-tutorial/blob/master/basic03-map-counter/xdp_prog_kern.c#L11 > > -Toke > >