On Fri, Apr 23, 2021 at 10:31:46AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Friday, April 23, 2021 7:40 AM > > > > On Thu, Apr 22, 2021 at 04:38:08PM -0600, Alex Williamson wrote: > > > > > Because it's fundamental to the isolation of the device? What you're > > > proposing doesn't get around the group issue, it just makes it implicit > > > rather than explicit in the uapi. > > > > I'm not even sure it makes it explicit or implicit, it just takes away > > the FD. > > > > There are four group IOCTLs, I see them mapping to /dev/ioasid follows: > > VFIO_GROUP_GET_STATUS - > > + VFIO_GROUP_FLAGS_CONTAINER_SET is fairly redundant > > + VFIO_GROUP_FLAGS_VIABLE could be in a new sysfs under > > kernel/iomm_groups, or could be an IOCTL on /dev/ioasid > > IOASID_ALL_DEVICES_VIABLE > > > > VFIO_GROUP_SET_CONTAINER - > > + This happens implicitly when the device joins the IOASID > > so it gets moved to the vfio_device FD: > > ioctl(vifo_device_fd, JOIN_IOASID_FD, ioasifd) > > > > VFIO_GROUP_UNSET_CONTAINER - > > + Also moved to the vfio_device FD, opposite of JOIN_IOASID_FD > > > > VFIO_GROUP_GET_DEVICE_FD - > > + Replaced by opening /dev/vfio/deviceX > > Learn the deviceX which will be the cdev sysfs shows as: > > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/vfio/deviceX/dev > > Open /dev/vfio/deviceX > > > > > > How do we model the VFIO group security concept to something like > > > > VDPA? > > > > > > Is it really a "VFIO group security concept"? We're reflecting the > > > reality of the hardware, not all devices are fully isolated. > > > > Well, exactly. > > > > /dev/ioasid should understand the group concept somehow, otherwise it > > is incomplete and maybe even security broken. > > > > So, how do I add groups to, say, VDPA in a way that makes sense? The > > only answer I come to is broadly what I outlined here - make > > /dev/ioasid do all the group operations, and do them when we enjoin > > the VDPA device to the ioasid. > > > > Once I have solved all the groups problems with the non-VFIO users, > > then where does that leave VFIO? Why does VFIO need a group FD if > > everyone else doesn't? > > > > > IOMMU group. This is the reality that any userspace driver needs to > > > play in, it doesn't magically go away because we drop the group file > > > descriptor. > > > > I'm not saying it does, I'm saying it makes the uAPI more regular and > > easier to fit into /dev/ioasid without the group FD. > > > > > It only makes the uapi more difficult to use correctly because > > > userspace drivers need to go outside of the uapi to have any idea > > > that this restriction exists. > > > > I don't think it makes any substantive difference one way or the > > other. > > > > With the group FD: the userspace has to read sysfs, find the list of > > devices in the group, open the group fd, create device FDs for each > > device using the name from sysfs. > > > > Starting from a BDF the general pseudo code is > > group_path = readlink("/sys/bus/pci/devices/BDF/iommu_group") > > group_name = basename(group_path) > > group_fd = open("/dev/vfio/"+group_name) > > device_fd = ioctl(VFIO_GROUP_GET_DEVICE_FD, BDF); > > > > Without the group FD: the userspace has to read sysfs, find the list > > of devices in the group and then open the device-specific cdev (found > > via sysfs) and link them to a /dev/ioasid FD. > > > > Starting from a BDF the general pseudo code is: > > device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/") > > device_fd = open("/dev/vfio/"+device_name) > > ioasidfd = open("/dev/ioasid") > > ioctl(device_fd, JOIN_IOASID_FD, ioasidfd) > > > > These two routes can have identical outcomes and identical security > > checks. > > > > In both cases if userspace wants a list of BDFs in the same group as > > the BDF it is interested in: > > readdir("/sys/bus/pci/devices/BDF/iommu_group/devices") > > > > It seems like a very small difference to me. > > > > I still don't see how the group restriction gets surfaced to the > > application through the group FD. The applications I looked through > > just treat the group FD as a step on their way to get the device_fd. > > > > So your proposal sort of moves the entire container/group/domain > managment into /dev/ioasid and then leaves vfio only provide device > specific uAPI. An ioasid represents a page table (address space), thus > is equivalent to the scope of VFIO container. Right. I don't really know how /dev/iosasid is supposed to work, and so far I don't see how it conceptually differs from a container. What is it adding? > Having the device join > an ioasid is equivalent to attaching a device to VFIO container, and > here the group integrity must be enforced. Then /dev/ioasid anyway > needs to manage group objects and their association with ioasid and > underlying iommu domain thus it's pointless to keep same logic within > VFIO. Is this understanding correct? > > btw one remaining open is whether you expect /dev/ioasid to be > associated with a single iommu domain, or multiple. If only a single > domain is allowed, the ioasid_fd is equivalent to the scope of VFIO > container. It is supposed to have only one gpa_ioasid_id since one > iommu domain can only have a single 2nd level pgtable. Then all other > ioasids, once allocated, must be nested on this gpa_ioasid_id to fit > in the same domain. if a legacy vIOMMU is exposed (which disallows > nesting), the userspace has to open an ioasid_fd for every group. > This is basically the VFIO way. On the other hand if multiple domains > is allowed, there could be multiple ioasid_ids each holding a 2nd level > pgtable and an iommu domain (or a list of pgtables and domains due to > incompatibility issue as discussed in another thread), and can be > nested by other ioasids respectively. The application only needs > to open /dev/ioasid once regardless of whether vIOMMU allows > nesting, and has a single interface for ioasid allocation. Which way > do you prefer to? > > Thanks > Kevin > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson