From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FAC7C43460 for ; Fri, 23 Apr 2021 16:39:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E19F960BD3 for ; Fri, 23 Apr 2021 16:39:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231560AbhDWQjs (ORCPT ); Fri, 23 Apr 2021 12:39:48 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:36515 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229957AbhDWQjm (ORCPT ); Fri, 23 Apr 2021 12:39:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619195945; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1GQkBMDQ1KuOta/J/0gSTIN3uzemBUynRDLUIAu7wKk=; b=BL68RlStI9T2WRmztfEym7Vsp8BxpcmEbaeGO8IjGm81+KFlejIzd/Zdmdi9cVNuqUoLCQ M9Fv7WNdL1dGMWf2mY3VZ5tgxmVpDrc96nfGEuHPnZs+ntA0ogDNAUXv8zJhupJMdGomIj FsoeQF/nv8FiZPfwEJJwS7cJ+a4zU7o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-221-u8ZHLF9kONSSl2ho-aJrMw-1; Fri, 23 Apr 2021 12:39:01 -0400 X-MC-Unique: u8ZHLF9kONSSl2ho-aJrMw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EFBD584BA43; Fri, 23 Apr 2021 16:38:56 +0000 (UTC) Received: from redhat.com (ovpn-114-21.phx2.redhat.com [10.3.114.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7B0F95D720; Fri, 23 Apr 2021 16:38:51 +0000 (UTC) Date: Fri, 23 Apr 2021 10:38:51 -0600 From: Alex Williamson To: Jason Gunthorpe Cc: "Liu, Yi L" , Jacob Pan , Auger Eric , Jean-Philippe Brucker , "Tian, Kevin" , LKML , Joerg Roedel , Lu Baolu , David Woodhouse , "iommu@lists.linux-foundation.org" , "cgroups@vger.kernel.org" , Tejun Heo , Li Zefan , Johannes Weiner , Jean-Philippe Brucker , Jonathan Corbet , "Raj, Ashok" , "Wu, Hao" , "Jiang, Dave" , David Gibson , Alexey Kardashevskiy Subject: Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs Message-ID: <20210423103851.41138791@redhat.com> In-Reply-To: <20210422233950.GD1370958@nvidia.com> References: <20210421162307.GM1370958@nvidia.com> <20210421105451.56d3670a@redhat.com> <20210421175203.GN1370958@nvidia.com> <20210421133312.15307c44@redhat.com> <20210421230301.GP1370958@nvidia.com> <20210422111337.6ac3624d@redhat.com> <20210422175715.GA1370958@nvidia.com> <20210422133747.23322269@redhat.com> <20210422200024.GC1370958@nvidia.com> <20210422163808.2d173225@redhat.com> <20210422233950.GD1370958@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 22 Apr 2021 20:39:50 -0300 Jason Gunthorpe wrote: > On Thu, Apr 22, 2021 at 04:38:08PM -0600, Alex Williamson wrote: > > > Because it's fundamental to the isolation of the device? What you're > > proposing doesn't get around the group issue, it just makes it implicit > > rather than explicit in the uapi. > > I'm not even sure it makes it explicit or implicit, it just takes away > the FD. > > There are four group IOCTLs, I see them mapping to /dev/ioasid follows: > VFIO_GROUP_GET_STATUS - > + VFIO_GROUP_FLAGS_CONTAINER_SET is fairly redundant > + VFIO_GROUP_FLAGS_VIABLE could be in a new sysfs under > kernel/iomm_groups, or could be an IOCTL on /dev/ioasid > IOASID_ALL_DEVICES_VIABLE > > VFIO_GROUP_SET_CONTAINER - > + This happens implicitly when the device joins the IOASID > so it gets moved to the vfio_device FD: > ioctl(vifo_device_fd, JOIN_IOASID_FD, ioasifd) > > VFIO_GROUP_UNSET_CONTAINER - > + Also moved to the vfio_device FD, opposite of JOIN_IOASID_FD > > VFIO_GROUP_GET_DEVICE_FD - > + Replaced by opening /dev/vfio/deviceX > Learn the deviceX which will be the cdev sysfs shows as: > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/vfio/deviceX/dev > Open /dev/vfio/deviceX > > > > How do we model the VFIO group security concept to something like > > > VDPA? > > > > Is it really a "VFIO group security concept"? We're reflecting the > > reality of the hardware, not all devices are fully isolated. > > Well, exactly. > > /dev/ioasid should understand the group concept somehow, otherwise it > is incomplete and maybe even security broken. > > So, how do I add groups to, say, VDPA in a way that makes sense? The > only answer I come to is broadly what I outlined here - make > /dev/ioasid do all the group operations, and do them when we enjoin > the VDPA device to the ioasid. > > Once I have solved all the groups problems with the non-VFIO users, > then where does that leave VFIO? Why does VFIO need a group FD if > everyone else doesn't? This assumes there's a solution for vDPA that doesn't just ignore the problem and hope for the best. I can't speak to a vDPA solution. > > IOMMU group. This is the reality that any userspace driver needs to > > play in, it doesn't magically go away because we drop the group file > > descriptor. > > I'm not saying it does, I'm saying it makes the uAPI more regular and > easier to fit into /dev/ioasid without the group FD. > > > It only makes the uapi more difficult to use correctly because > > userspace drivers need to go outside of the uapi to have any idea > > that this restriction exists. > > I don't think it makes any substantive difference one way or the > other. > > With the group FD: the userspace has to read sysfs, find the list of > devices in the group, open the group fd, create device FDs for each > device using the name from sysfs. > > Starting from a BDF the general pseudo code is > group_path = readlink("/sys/bus/pci/devices/BDF/iommu_group") > group_name = basename(group_path) > group_fd = open("/dev/vfio/"+group_name) > device_fd = ioctl(VFIO_GROUP_GET_DEVICE_FD, BDF); > > Without the group FD: the userspace has to read sysfs, find the list > of devices in the group and then open the device-specific cdev (found > via sysfs) and link them to a /dev/ioasid FD. > > Starting from a BDF the general pseudo code is: > device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/") > device_fd = open("/dev/vfio/"+device_name) > ioasidfd = open("/dev/ioasid") > ioctl(device_fd, JOIN_IOASID_FD, ioasidfd) This is exactly the implicit vs explicit semantics. In the existing vfio case, the user needs to explicitly interact with the group. In your proposal, the user interacts with the device, the group concept is an implicit restriction. You've listed a step in the description about a "list of devices in the group", but nothing in the pseudo code reflects that step. I expect it would be a subtly missed by any userspace driver developer unless they happen to work on a system where the grouping is not ideal. I think that makes the interface hard to use correctly. > These two routes can have identical outcomes and identical security > checks. > > In both cases if userspace wants a list of BDFs in the same group as > the BDF it is interested in: > readdir("/sys/bus/pci/devices/BDF/iommu_group/devices") > > It seems like a very small difference to me. The difference is that the group becomes a nuance, that I expect would be ignored, rather than a first class concept in the API. > I still don't see how the group restriction gets surfaced to the > application through the group FD. The applications I looked through > just treat the group FD as a step on their way to get the device_fd. A step where the developer hopefully recognizes that there might be other devices in a group, a group can't be opened more than once, the group has a flag indicating viability, they can't actually get the device fd until the group is attached to an IOMMU context, all devices in the group therefore have the same IOMMU context, and device fd isn't actually available until they've gone through a proper setup, which is an additional layer of protection. A userspace vfio developer may not understand why a group isn't viable, but they have that clue that it's something at the group level to investigate. Most of the userspace vfio drivers that I haven't contributed myself have come about with little or no interaction from me, so I'd like to think that the vfio uapi is actually somewhat intuitive in its concepts and difficult to use incorrectly. Thanks, Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D02C7C433B4 for ; Fri, 23 Apr 2021 16:39:11 +0000 (UTC) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5C2A8611BF for ; Fri, 23 Apr 2021 16:39:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5C2A8611BF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=iommu-bounces@lists.linux-foundation.org Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 2043F605E8; Fri, 23 Apr 2021 16:39:11 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zYA_WHdEsonv; Fri, 23 Apr 2021 16:39:10 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTP id E1C3B6066E; Fri, 23 Apr 2021 16:39:09 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id B0EE6C000E; Fri, 23 Apr 2021 16:39:09 +0000 (UTC) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0E453C000B for ; Fri, 23 Apr 2021 16:39:08 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id DBD6984384 for ; Fri, 23 Apr 2021 16:39:07 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id y6vVcvcOAOrh for ; Fri, 23 Apr 2021 16:39:06 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 7DE1B8437E for ; Fri, 23 Apr 2021 16:39:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619195945; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1GQkBMDQ1KuOta/J/0gSTIN3uzemBUynRDLUIAu7wKk=; b=BL68RlStI9T2WRmztfEym7Vsp8BxpcmEbaeGO8IjGm81+KFlejIzd/Zdmdi9cVNuqUoLCQ M9Fv7WNdL1dGMWf2mY3VZ5tgxmVpDrc96nfGEuHPnZs+ntA0ogDNAUXv8zJhupJMdGomIj FsoeQF/nv8FiZPfwEJJwS7cJ+a4zU7o= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-221-u8ZHLF9kONSSl2ho-aJrMw-1; Fri, 23 Apr 2021 12:39:01 -0400 X-MC-Unique: u8ZHLF9kONSSl2ho-aJrMw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EFBD584BA43; Fri, 23 Apr 2021 16:38:56 +0000 (UTC) Received: from redhat.com (ovpn-114-21.phx2.redhat.com [10.3.114.21]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7B0F95D720; Fri, 23 Apr 2021 16:38:51 +0000 (UTC) Date: Fri, 23 Apr 2021 10:38:51 -0600 From: Alex Williamson To: Jason Gunthorpe Subject: Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs Message-ID: <20210423103851.41138791@redhat.com> In-Reply-To: <20210422233950.GD1370958@nvidia.com> References: <20210421162307.GM1370958@nvidia.com> <20210421105451.56d3670a@redhat.com> <20210421175203.GN1370958@nvidia.com> <20210421133312.15307c44@redhat.com> <20210421230301.GP1370958@nvidia.com> <20210422111337.6ac3624d@redhat.com> <20210422175715.GA1370958@nvidia.com> <20210422133747.23322269@redhat.com> <20210422200024.GC1370958@nvidia.com> <20210422163808.2d173225@redhat.com> <20210422233950.GD1370958@nvidia.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Cc: Jean-Philippe Brucker , "Tian, Kevin" , "Jiang, Dave" , "Raj, Ashok" , Jonathan Corbet , Jean-Philippe Brucker , Li Zefan , LKML , "iommu@lists.linux-foundation.org" , David Gibson , Johannes Weiner , Tejun Heo , "cgroups@vger.kernel.org" , "Wu, Hao" , David Woodhouse X-BeenThere: iommu@lists.linux-foundation.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Development issues for Linux IOMMU support List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: iommu-bounces@lists.linux-foundation.org Sender: "iommu" On Thu, 22 Apr 2021 20:39:50 -0300 Jason Gunthorpe wrote: > On Thu, Apr 22, 2021 at 04:38:08PM -0600, Alex Williamson wrote: > > > Because it's fundamental to the isolation of the device? What you're > > proposing doesn't get around the group issue, it just makes it implicit > > rather than explicit in the uapi. > > I'm not even sure it makes it explicit or implicit, it just takes away > the FD. > > There are four group IOCTLs, I see them mapping to /dev/ioasid follows: > VFIO_GROUP_GET_STATUS - > + VFIO_GROUP_FLAGS_CONTAINER_SET is fairly redundant > + VFIO_GROUP_FLAGS_VIABLE could be in a new sysfs under > kernel/iomm_groups, or could be an IOCTL on /dev/ioasid > IOASID_ALL_DEVICES_VIABLE > > VFIO_GROUP_SET_CONTAINER - > + This happens implicitly when the device joins the IOASID > so it gets moved to the vfio_device FD: > ioctl(vifo_device_fd, JOIN_IOASID_FD, ioasifd) > > VFIO_GROUP_UNSET_CONTAINER - > + Also moved to the vfio_device FD, opposite of JOIN_IOASID_FD > > VFIO_GROUP_GET_DEVICE_FD - > + Replaced by opening /dev/vfio/deviceX > Learn the deviceX which will be the cdev sysfs shows as: > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/vfio/deviceX/dev > Open /dev/vfio/deviceX > > > > How do we model the VFIO group security concept to something like > > > VDPA? > > > > Is it really a "VFIO group security concept"? We're reflecting the > > reality of the hardware, not all devices are fully isolated. > > Well, exactly. > > /dev/ioasid should understand the group concept somehow, otherwise it > is incomplete and maybe even security broken. > > So, how do I add groups to, say, VDPA in a way that makes sense? The > only answer I come to is broadly what I outlined here - make > /dev/ioasid do all the group operations, and do them when we enjoin > the VDPA device to the ioasid. > > Once I have solved all the groups problems with the non-VFIO users, > then where does that leave VFIO? Why does VFIO need a group FD if > everyone else doesn't? This assumes there's a solution for vDPA that doesn't just ignore the problem and hope for the best. I can't speak to a vDPA solution. > > IOMMU group. This is the reality that any userspace driver needs to > > play in, it doesn't magically go away because we drop the group file > > descriptor. > > I'm not saying it does, I'm saying it makes the uAPI more regular and > easier to fit into /dev/ioasid without the group FD. > > > It only makes the uapi more difficult to use correctly because > > userspace drivers need to go outside of the uapi to have any idea > > that this restriction exists. > > I don't think it makes any substantive difference one way or the > other. > > With the group FD: the userspace has to read sysfs, find the list of > devices in the group, open the group fd, create device FDs for each > device using the name from sysfs. > > Starting from a BDF the general pseudo code is > group_path = readlink("/sys/bus/pci/devices/BDF/iommu_group") > group_name = basename(group_path) > group_fd = open("/dev/vfio/"+group_name) > device_fd = ioctl(VFIO_GROUP_GET_DEVICE_FD, BDF); > > Without the group FD: the userspace has to read sysfs, find the list > of devices in the group and then open the device-specific cdev (found > via sysfs) and link them to a /dev/ioasid FD. > > Starting from a BDF the general pseudo code is: > device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/") > device_fd = open("/dev/vfio/"+device_name) > ioasidfd = open("/dev/ioasid") > ioctl(device_fd, JOIN_IOASID_FD, ioasidfd) This is exactly the implicit vs explicit semantics. In the existing vfio case, the user needs to explicitly interact with the group. In your proposal, the user interacts with the device, the group concept is an implicit restriction. You've listed a step in the description about a "list of devices in the group", but nothing in the pseudo code reflects that step. I expect it would be a subtly missed by any userspace driver developer unless they happen to work on a system where the grouping is not ideal. I think that makes the interface hard to use correctly. > These two routes can have identical outcomes and identical security > checks. > > In both cases if userspace wants a list of BDFs in the same group as > the BDF it is interested in: > readdir("/sys/bus/pci/devices/BDF/iommu_group/devices") > > It seems like a very small difference to me. The difference is that the group becomes a nuance, that I expect would be ignored, rather than a first class concept in the API. > I still don't see how the group restriction gets surfaced to the > application through the group FD. The applications I looked through > just treat the group FD as a step on their way to get the device_fd. A step where the developer hopefully recognizes that there might be other devices in a group, a group can't be opened more than once, the group has a flag indicating viability, they can't actually get the device fd until the group is attached to an IOMMU context, all devices in the group therefore have the same IOMMU context, and device fd isn't actually available until they've gone through a proper setup, which is an additional layer of protection. A userspace vfio developer may not understand why a group isn't viable, but they have that clue that it's something at the group level to investigate. Most of the userspace vfio drivers that I haven't contributed myself have come about with little or no interaction from me, so I'd like to think that the vfio uapi is actually somewhat intuitive in its concepts and difficult to use incorrectly. Thanks, Alex _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs Date: Fri, 23 Apr 2021 10:38:51 -0600 Message-ID: <20210423103851.41138791@redhat.com> References: <20210421162307.GM1370958@nvidia.com> <20210421105451.56d3670a@redhat.com> <20210421175203.GN1370958@nvidia.com> <20210421133312.15307c44@redhat.com> <20210421230301.GP1370958@nvidia.com> <20210422111337.6ac3624d@redhat.com> <20210422175715.GA1370958@nvidia.com> <20210422133747.23322269@redhat.com> <20210422200024.GC1370958@nvidia.com> <20210422163808.2d173225@redhat.com> <20210422233950.GD1370958@nvidia.com> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1619195945; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1GQkBMDQ1KuOta/J/0gSTIN3uzemBUynRDLUIAu7wKk=; b=BL68RlStI9T2WRmztfEym7Vsp8BxpcmEbaeGO8IjGm81+KFlejIzd/Zdmdi9cVNuqUoLCQ M9Fv7WNdL1dGMWf2mY3VZ5tgxmVpDrc96nfGEuHPnZs+ntA0ogDNAUXv8zJhupJMdGomIj FsoeQF/nv8FiZPfwEJJwS7cJ+a4zU7o= In-Reply-To: <20210422233950.GD1370958-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> List-ID: Content-Type: text/plain; charset="us-ascii" To: Jason Gunthorpe Cc: "Liu, Yi L" , Jacob Pan , Auger Eric , Jean-Philippe Brucker , "Tian, Kevin" , LKML , Joerg Roedel , Lu Baolu , David Woodhouse , "iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org" , "cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org" , Tejun Heo , Li Zefan , Johannes Weiner , Jean-Philippe Brucker , Jonathan Corbet , "Raj, Ashok" , "Wu, Hao" , "Jiang, Dave" , David On Thu, 22 Apr 2021 20:39:50 -0300 Jason Gunthorpe wrote: > On Thu, Apr 22, 2021 at 04:38:08PM -0600, Alex Williamson wrote: > > > Because it's fundamental to the isolation of the device? What you're > > proposing doesn't get around the group issue, it just makes it implicit > > rather than explicit in the uapi. > > I'm not even sure it makes it explicit or implicit, it just takes away > the FD. > > There are four group IOCTLs, I see them mapping to /dev/ioasid follows: > VFIO_GROUP_GET_STATUS - > + VFIO_GROUP_FLAGS_CONTAINER_SET is fairly redundant > + VFIO_GROUP_FLAGS_VIABLE could be in a new sysfs under > kernel/iomm_groups, or could be an IOCTL on /dev/ioasid > IOASID_ALL_DEVICES_VIABLE > > VFIO_GROUP_SET_CONTAINER - > + This happens implicitly when the device joins the IOASID > so it gets moved to the vfio_device FD: > ioctl(vifo_device_fd, JOIN_IOASID_FD, ioasifd) > > VFIO_GROUP_UNSET_CONTAINER - > + Also moved to the vfio_device FD, opposite of JOIN_IOASID_FD > > VFIO_GROUP_GET_DEVICE_FD - > + Replaced by opening /dev/vfio/deviceX > Learn the deviceX which will be the cdev sysfs shows as: > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/vfio/deviceX/dev > Open /dev/vfio/deviceX > > > > How do we model the VFIO group security concept to something like > > > VDPA? > > > > Is it really a "VFIO group security concept"? We're reflecting the > > reality of the hardware, not all devices are fully isolated. > > Well, exactly. > > /dev/ioasid should understand the group concept somehow, otherwise it > is incomplete and maybe even security broken. > > So, how do I add groups to, say, VDPA in a way that makes sense? The > only answer I come to is broadly what I outlined here - make > /dev/ioasid do all the group operations, and do them when we enjoin > the VDPA device to the ioasid. > > Once I have solved all the groups problems with the non-VFIO users, > then where does that leave VFIO? Why does VFIO need a group FD if > everyone else doesn't? This assumes there's a solution for vDPA that doesn't just ignore the problem and hope for the best. I can't speak to a vDPA solution. > > IOMMU group. This is the reality that any userspace driver needs to > > play in, it doesn't magically go away because we drop the group file > > descriptor. > > I'm not saying it does, I'm saying it makes the uAPI more regular and > easier to fit into /dev/ioasid without the group FD. > > > It only makes the uapi more difficult to use correctly because > > userspace drivers need to go outside of the uapi to have any idea > > that this restriction exists. > > I don't think it makes any substantive difference one way or the > other. > > With the group FD: the userspace has to read sysfs, find the list of > devices in the group, open the group fd, create device FDs for each > device using the name from sysfs. > > Starting from a BDF the general pseudo code is > group_path = readlink("/sys/bus/pci/devices/BDF/iommu_group") > group_name = basename(group_path) > group_fd = open("/dev/vfio/"+group_name) > device_fd = ioctl(VFIO_GROUP_GET_DEVICE_FD, BDF); > > Without the group FD: the userspace has to read sysfs, find the list > of devices in the group and then open the device-specific cdev (found > via sysfs) and link them to a /dev/ioasid FD. > > Starting from a BDF the general pseudo code is: > device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/") > device_fd = open("/dev/vfio/"+device_name) > ioasidfd = open("/dev/ioasid") > ioctl(device_fd, JOIN_IOASID_FD, ioasidfd) This is exactly the implicit vs explicit semantics. In the existing vfio case, the user needs to explicitly interact with the group. In your proposal, the user interacts with the device, the group concept is an implicit restriction. You've listed a step in the description about a "list of devices in the group", but nothing in the pseudo code reflects that step. I expect it would be a subtly missed by any userspace driver developer unless they happen to work on a system where the grouping is not ideal. I think that makes the interface hard to use correctly. > These two routes can have identical outcomes and identical security > checks. > > In both cases if userspace wants a list of BDFs in the same group as > the BDF it is interested in: > readdir("/sys/bus/pci/devices/BDF/iommu_group/devices") > > It seems like a very small difference to me. The difference is that the group becomes a nuance, that I expect would be ignored, rather than a first class concept in the API. > I still don't see how the group restriction gets surfaced to the > application through the group FD. The applications I looked through > just treat the group FD as a step on their way to get the device_fd. A step where the developer hopefully recognizes that there might be other devices in a group, a group can't be opened more than once, the group has a flag indicating viability, they can't actually get the device fd until the group is attached to an IOMMU context, all devices in the group therefore have the same IOMMU context, and device fd isn't actually available until they've gone through a proper setup, which is an additional layer of protection. A userspace vfio developer may not understand why a group isn't viable, but they have that clue that it's something at the group level to investigate. Most of the userspace vfio drivers that I haven't contributed myself have come about with little or no interaction from me, so I'd like to think that the vfio uapi is actually somewhat intuitive in its concepts and difficult to use incorrectly. Thanks, Alex