On Tue, Jun 29, 2021 at 6:39 AM Jason Wang <jasowang@redhat.com> wrote:

在 2021/6/28 下午7:18, Yuri Benditovich 写道:
> On Wed, Jun 23, 2021 at 3:47 AM Jason Wang <jasowang@redhat.com> wrote:
>>
>> 在 2021/6/22 下午5:09, Toke Høiland-Jørgensen 写道:
>>> Daniel P. Berrangé <berrange@redhat.com> writes:
>>>
>>>> On Tue, Jun 22, 2021 at 10:25:19AM +0200, Toke Høiland-Jørgensen wrote:
>>>>> Jason Wang <jasowang@redhat.com> writes:
>>>>>
>>>>>> 在 2021/6/22 上午11:29, Yuri Benditovich 写道:
>>>>>>> On Mon, Jun 21, 2021 at 12:20 PM Jason Wang <jasowang@redhat.com> wrote:
>>>>>>>> 在 2021/6/19 上午4:03, Andrew Melnichenko 写道:
>>>>>>>>> Hi Jason,
>>>>>>>>> I've checked "kernel.unprivileged_bpf_disabled=0" on Fedora, Ubuntu,
>>>>>>>>> and Debian - no need permissions to update BPF maps.
>>>>>>>> How about RHEL :) ?
>>>>>>> If I'm not mistaken, the RHEL releases do not use modern kernels yet
>>>>>>> (for BPF we need 5.8+).
>>>>>>> So this will be (probably) relevant for RHEL 9. Please correct me if I'm wrong.
>>>>>> Adding Toke for more ideas on this.
>>>>> Ignore the kernel version number; we backport all of BPF to RHEL,
>>>>> basically. RHEL8.4 is up to upstream kernel 5.10, feature-wise.
>>>>>
>>>>> However, we completely disable unprivileged BPF on RHEL kernels. Also,
>>>>> there's upstream commit:
>>>>> 08389d888287 ("bpf: Add kconfig knob for disabling unpriv bpf by default")
>>>>>
>>>>> which adds a new value of '2' to the unprivileged_bpf_disable sysctl. I
>>>>> believe this may end up being the default on Fedora as well.
>>>>>
>>>>> So any design relying on unprivileged BPF is likely to break; I'd
>>>>> suggest you look into how you can get this to work with CAP_BPF :)
>>>> QEMU will never have any capabilities. Any resources that required
>>>> privileges have to be opened by a separate privileged helper, and the
>>>> open FD then passed across to the QEMU process. This relies on the
>>>> capabilities checks only being performed at time of initial opening,
>>>> and *not* on operations performed on the already open FD.
>>> That won't work for regular map updates either, unfortunately: you still
>>> have to perform a bpf() syscall to update an element, and that is a
>>> privileged operation.
>>>
>>> You may be able to get around this by using an array map type and
>>> mmap()'ing the map contents, but I'm not sure how well that will work
>>> across process boundaries.
>>>
>>> If it doesn't, I only see two possibilities: populate the map
>>> ahead-of-time and leave it in place, or keep the privileged helper
>>> process around to perform map updates on behalf of QEMU...
>>
>> Right, and this could be probably done by extending and tracking the RSS
>> update via rx filter event.
> Jason,
> Can you please get a little into details - what you mean by 'extending
> and tracking the RSS

There's a monitor event which could be used for qemu to notify the
privileged application (e.g the one has CAP_NET_ADMIN) to update the rx
filter attributes of the host networking device.

It works like, when the rx filters is updated by guest, qemu will
generate an rx filter update event (see rxfilter_notify()) which could
be captured by the privileged application.

Then the privileged application query rx filter information via
query-rx-filter command and do the proper setups.

This is designed for macvtap but I think it might be used by RSS as well.

The helper can monitor the rx-filter event and update the eBPF maps. But
I'm not sure if it needs some coordination with libvirt in this case.

Thanks

>> update via rx filter event'?
> Thanks,
> Yuri
>
>> Thanks
>>
>>
>>> -Toke
>>>