From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E649C48BE0 for ; Fri, 11 Jun 2021 16:50:33 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14D31613BA for ; Fri, 11 Jun 2021 16:50:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14D31613BA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:43090 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lrkMa-0004Ht-9b for qemu-devel@archiver.kernel.org; Fri, 11 Jun 2021 12:50:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:60916) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lrkLh-0003bo-9C for qemu-devel@nongnu.org; Fri, 11 Jun 2021 12:49:37 -0400 Received: from mail-oi1-x229.google.com ([2607:f8b0:4864:20::229]:41537) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lrkLe-0001sA-7l for qemu-devel@nongnu.org; Fri, 11 Jun 2021 12:49:36 -0400 Received: by mail-oi1-x229.google.com with SMTP id t40so6419404oiw.8 for ; Fri, 11 Jun 2021 09:49:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vCAS1Ei6RfKX3+TUmCzlR4Akz9AaFkhQzSAAkY5V/KM=; b=0Q1YPzv+IUm6uii2iW/y060JDdRB7KhbL8N7SFj0mJDVpaLxgq589Jk2ZhQAssOsbL SH0OD1b6rTOJoPy77i/HziGLPgA7Z7gCTBGt7DVBf8l0ngzonx6/DLbjBnuDOXQkAaWK eExy1KvC2oF9/ai5w4dUClQe/qGldCmAxFY6DFA+sO5IdEBao2f20s8RBRk27S0246+A BjXLShCWBb0baSHnJOcS8X5My8ALyR345hRrGAoTZrBJt4qcIGb5DkePFjsvQYYYkSxW pwn14IvpheXiyAheGNoNrv0Ywx6qb9D+35AohwSokjC/he7o3inyHytdqqg9U9zoKExz rRgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vCAS1Ei6RfKX3+TUmCzlR4Akz9AaFkhQzSAAkY5V/KM=; b=Yqsy7mKC7Rm/Hi15hgziy5p3sKAR2aR+eZ+Sfh5YgioaAgS2JFUvhKPG0fIdoOsj6s wwX+cjiKJvYH3pH6olYG4ht7ITjV2QSZNoZYEolBJfbvM0Zkay3bUF6S+gEsIuDCfp00 Imvi/bKO74PKSxQqea3a0EOTsx9ANBJnrBRXfoQfRwR5QSuqw+P1L6GCZtTYxlYsO9eU 0HKJC0lWCL1BkO1o+nC+g+c3cR3kP8Hj740BhZgtKCsYcQ0M6b4LLNQD+CVCfxHlkb09 3NnbfUhKIskVxG7yLa/yCt6U8MM8+wFfyd763pDANN/yFusCY97vE9FvzvbYLcaLn+uY 1bjQ== X-Gm-Message-State: AOAM530VYsmuj7eqYbGgel3LLF4+dHK1TLB/7y3g//pRKLL3GIIBNv9A miVqr4UNbXnYzx8wc6E+C/bYz2EEExFp5/7NSbkQnQ== X-Google-Smtp-Source: ABdhPJyjTcFoonUtGZ/KLbgNmw5OFaDKGDAgdMoYYUjV5o+UohJhdGM+AYaWRsnHA7JehPljXjzsuWCv2Zi9nQ60Kxo= X-Received: by 2002:a54:448f:: with SMTP id v15mr8495981oiv.18.1623430172193; Fri, 11 Jun 2021 09:49:32 -0700 (PDT) MIME-Version: 1.0 References: <20210609100457.142570-1-andrew@daynix.com> <3da88930-439c-1892-29b4-4977ddbb0b0a@redhat.com> In-Reply-To: <3da88930-439c-1892-29b4-4977ddbb0b0a@redhat.com> From: Andrew Melnichenko Date: Fri, 11 Jun 2021 19:49:21 +0300 Message-ID: Subject: Re: [RFC PATCH 0/5] ebpf: Added ebpf helper for libvirtd. To: Jason Wang Content-Type: multipart/alternative; boundary="000000000000bb463e05c4804ac0" Received-SPF: none client-ip=2607:f8b0:4864:20::229; envelope-from=andrew@daynix.com; helo=mail-oi1-x229.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: =?UTF-8?Q?Daniel_P=2E_Berrang=C3=A9?= , "Michael S . Tsirkin" , qemu-devel@nongnu.org, Markus Armbruster , Yuri Benditovich , Yan Vugenfirer , Eric Blake Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --000000000000bb463e05c4804ac0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, > So I think the series is for unprivileged_bpf disabled. If I'm not > wrong, I guess the policy is to grant CAP_BPF but do fine grain checks > via LSM. > The main idea is to run eBPF RSS with qemu without any permission. Libvirt should handle everything and pass proper eBPF file descriptors. For current eBPF RSS, CAP_SYS_ADMIN(bypass some limitations) also required, and in the future may be other permissions. I'm not sure this is the best. We have several examples that let libvirt > to involve. Examples: > > 1) create TAP device (and the TUN_SETIFF) > > 2) open vhost devices > Technically TAP/vhost not related to a particular qemu emulator. So common TAP creation should fit any modern qemu. eBPF fds(program and maps) should suit the interface for current qemu, g.e. some qemu builds may have different map structures or their count. It's necessary that the qemu got fds prepared by the helper that was built with the qemu. I think we need an example on the detail steps for how libvirt is > expected to use this. > The simplified workflow looks like this: 1. Libvirt got "emulator" from domain document. 2. Libvirt queries for qemu capabilities. 3. One of the capabilities is "qemu-ebpf-rss-helper" path(if present). 4. On NIC preparation Libvirt checks for virtio-net + rss configurations= . 5. If required, the "qemu-ebpf-rss-helper" called and fds are received through unix fd. 6. Those fds are for eBPF RSS, which passed to child process - qemu. 7. Qemu launched with virtio-net-pci property "rss" and "ebpf_rss_fds". On Fri, Jun 11, 2021 at 8:36 AM Jason Wang wrote: > > =E5=9C=A8 2021/6/10 =E4=B8=8B=E5=8D=882:55, Yuri Benditovich =E5=86=99=E9= =81=93: > > On Thu, Jun 10, 2021 at 9:41 AM Jason Wang wrote: > >> =E5=9C=A8 2021/6/9 =E4=B8=8B=E5=8D=886:04, Andrew Melnychenko =E5=86= =99=E9=81=93: > >>> Libvirt usually launches qemu with strict permissions. > >>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added. > >> A silly question: > >> > >> Kernel had the following permission checks in bpf syscall: > >> > >> if (sysctl_unprivileged_bpf_disabled && !bpf_capable()) > >> return -EPERM; > >> ... > >> > >> err =3D security_bpf(cmd, &attr, size); > >> if (err < 0) > >> return err; > >> > >> So if I understand the code correctly, bpf syscall can only be done if= : > >> > >> 1) unprivileged_bpf is enabled or > >> 2) has the capability and pass the LSM checks > >> > >> So I think the series is for unprivileged_bpf disabled. If I'm not > >> wrong, I guess the policy is to grant CAP_BPF but do fine grain checks > >> via LSM. > >> > >> If this is correct, need to describe it in the commit log. > >> > >> > >>> Added property "ebpf_rss_fds" for "virtio-net" that allows to > >>> initialize eBPF RSS context with passed program & maps fds. > >>> > >>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF > >>> context and passes fds through unix socket. > >>> Libvirt should call the helper and pass fds to qemu through > >>> "ebpf_rss_fds" property. > >>> > >>> Added explicit target OS check for libbpf dependency in meson. > >>> eBPF RSS works only with Linux TAP, so there is no reason to > >>> build eBPF loader/helper for non-Linux. > >>> > >>> Overall, libvirt process should not be aware of the "interface" > >>> of eBPF RSS, it will not be aware of eBPF maps/program "type" and > >>> their quantity. > >> I'm not sure this is the best. We have several examples that let libvi= rt > >> to involve. Examples: > >> > >> 1) create TAP device (and the TUN_SETIFF) > >> > >> 2) open vhost devices > >> > >> > >>> That's why qemu and the helper should be from > >>> the same build and be "synchronized". Technically each qemu may > >>> have its own helper. That's why "query-helper-paths" qmp command > >>> was added. Qemu should return the path to the helper that suits > >>> and libvirt should use "that" helper for "that" emulator. > >>> > >>> qmp sample: > >>> C: { "execute": "query-helper-paths" } > >>> S: { "return": [ > >>> { > >>> "name": "qemu-ebpf-rss-helper", > >>> "path": "/usr/local/libexec/qemu-ebpf-rss-helper" > >>> } > >>> ] > >>> } > >> I think we need an example on the detail steps for how libvirt is > >> expected to use this. > > The preliminary patches for libvirt are at > > https://github.com/daynix/libvirt/tree/RSSv1 > > > Will have a look but it would be better if the assumption of the > management is detailed here to ease the reviewers. > > Thanks > > > > > > --000000000000bb463e05c4804ac0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,
So I think the series is for unprivileged_bpf disabled. If I'= ;m not
wrong, I guess the policy is to grant CAP_BPF but do fine grain checks
via LSM.

The main idea is to run eBPF= RSS with qemu without any permission.
Libvirt should handle ever= ything and pass proper eBPF file descriptors.
For current eBPF RS= S, CAP_SYS_ADMIN(bypass some limitations)
also required, and in= the future may be other permissions.

I'm not sure this is the best. We= have several examples that let libvirt
to involve. Examples:

1) create TAP device (and the TUN_SETIFF)

2) open vhost devices

Technically TAP/vhost not related to a particular qemu= emulator. So common
TAP creation should fit any modern qemu. eBP= F fds(program and maps) should
suit the interface for current qem= u, g.e. some qemu builds may have different map
structures or the= ir count. It's necessary that the qemu got fds prepared by the helper
that was built with the qemu.

I think we need an example on the d= etail steps for how libvirt is
expected to use this.

The simplified = workflow looks like this:
  1. Libvirt got "emulator"= ; from domain document.
  2. Libvirt queries for qemu capabilities.
  3. =
  4. One of the capabilities is "qemu-ebpf-rss-helper" path(if pre= sent).
  5. On NIC preparation Libvirt checks for virtio-net + rss confi= gurations.
  6. If required, the "qemu-ebpf-rss-helper" called= and fds are received through unix fd.
  7. Those fds are for eBPF RSS, which passed to child proces= s - qemu.
  8. Qemu launched with virt= io-net-pci property "rss" and "ebpf_rss_fds".

On Fri, Jun 11, 2021 at 8:36 AM Jason Wang &= lt;jasowang@redhat.com> wrote= :

=E5=9C=A8 2021/6/10 =E4=B8=8B=E5=8D=882:55, Yuri Benditovich =E5=86=99=E9= =81=93:
> On Thu, Jun 10, 2021 at 9:41 AM Jason Wang<jasowang@redhat.com>=C2=A0 wrote: >> =E5=9C=A8 2021/6/9 =E4=B8=8B=E5=8D=886:04, Andrew Melnychenko =E5= =86=99=E9=81=93:
>>> Libvirt usually launches qemu with strict permissions.
>>> To enable eBPF RSS steering, qemu-ebpf-rss-helper was added. >> A silly question:
>>
>> Kernel had the following permission checks in bpf syscall:
>>
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (sysctl_unprivileged_bpf_disa= bled && !bpf_capable())
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0return -EPERM;
>> ...
>>
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0err =3D security_bpf(cmd, = &attr, size);
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (err < 0)
>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0return err;
>>
>> So if I understand the code correctly, bpf syscall can only be don= e if:
>>
>> 1) unprivileged_bpf is enabled or
>> 2) has the capability=C2=A0 and pass the LSM checks
>>
>> So I think the series is for unprivileged_bpf disabled. If I'm= not
>> wrong, I guess the policy is to grant CAP_BPF but do fine grain ch= ecks
>> via LSM.
>>
>> If this is correct, need to describe it in the commit log.
>>
>>
>>> Added property "ebpf_rss_fds" for "virtio-net&q= uot; that allows to
>>> initialize eBPF RSS context with passed program & maps fds= .
>>>
>>> Added qemu-ebpf-rss-helper - simple helper that loads eBPF
>>> context and passes fds through unix socket.
>>> Libvirt should call the helper and pass fds to qemu through >>> "ebpf_rss_fds" property.
>>>
>>> Added explicit target OS check for libbpf dependency in meson.=
>>> eBPF RSS works only with Linux TAP, so there is no reason to >>> build eBPF loader/helper for non-Linux.
>>>
>>> Overall, libvirt process should not be aware of the "inte= rface"
>>> of eBPF RSS, it will not be aware of eBPF maps/program "t= ype" and
>>> their quantity.
>> I'm not sure this is the best. We have several examples that l= et libvirt
>> to involve. Examples:
>>
>> 1) create TAP device (and the TUN_SETIFF)
>>
>> 2) open vhost devices
>>
>>
>>>=C2=A0 =C2=A0 That's why qemu and the helper should be from=
>>> the same build and be "synchronized". Technically ea= ch qemu may
>>> have its own helper. That's why "query-helper-paths&q= uot; qmp command
>>> was added. Qemu should return the path to the helper that suit= s
>>> and libvirt should use "that" helper for "that&= quot; emulator.
>>>
>>> qmp sample:
>>> C: { "execute": "query-helper-paths" }
>>> S: { "return": [
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 {
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "name": "qemu= -ebpf-rss-helper",
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 "path": "/usr= /local/libexec/qemu-ebpf-rss-helper"
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
>>>=C2=A0 =C2=A0 =C2=A0 =C2=A0]
>>>=C2=A0 =C2=A0 =C2=A0 }
>> I think we need an example on the detail steps for how libvirt is<= br> >> expected to use this.
> The preliminary patches for libvirt are at
> https://github.com/daynix/libvirt/tree/RSSv1

Will have a look but it would be better if the assumption of the
management is detailed here to ease the reviewers.

Thanks


>

--000000000000bb463e05c4804ac0--