On Thu, Apr 22, 2021 at 08:39:50PM -0300, Jason Gunthorpe wrote: > On Thu, Apr 22, 2021 at 04:38:08PM -0600, Alex Williamson wrote: > > > Because it's fundamental to the isolation of the device? What you're > > proposing doesn't get around the group issue, it just makes it implicit > > rather than explicit in the uapi. > > I'm not even sure it makes it explicit or implicit, it just takes away > the FD. > > There are four group IOCTLs, I see them mapping to /dev/ioasid follows: > VFIO_GROUP_GET_STATUS - > + VFIO_GROUP_FLAGS_CONTAINER_SET is fairly redundant > + VFIO_GROUP_FLAGS_VIABLE could be in a new sysfs under > kernel/iomm_groups, or could be an IOCTL on /dev/ioasid > IOASID_ALL_DEVICES_VIABLE > > VFIO_GROUP_SET_CONTAINER - > + This happens implicitly when the device joins the IOASID > so it gets moved to the vfio_device FD: > ioctl(vifo_device_fd, JOIN_IOASID_FD, ioasifd) > > VFIO_GROUP_UNSET_CONTAINER - > + Also moved to the vfio_device FD, opposite of JOIN_IOASID_FD > > VFIO_GROUP_GET_DEVICE_FD - > + Replaced by opening /dev/vfio/deviceX > Learn the deviceX which will be the cdev sysfs shows as: > /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/vfio/deviceX/dev > Open /dev/vfio/deviceX > > > > How do we model the VFIO group security concept to something like > > > VDPA? > > > > Is it really a "VFIO group security concept"? We're reflecting the > > reality of the hardware, not all devices are fully isolated. > > Well, exactly. > > /dev/ioasid should understand the group concept somehow, otherwise it > is incomplete and maybe even security broken. > > So, how do I add groups to, say, VDPA in a way that makes sense? The > only answer I come to is broadly what I outlined here - make > /dev/ioasid do all the group operations, and do them when we enjoin > the VDPA device to the ioasid. > > Once I have solved all the groups problems with the non-VFIO users, > then where does that leave VFIO? Why does VFIO need a group FD if > everyone else doesn't? > > > IOMMU group. This is the reality that any userspace driver needs to > > play in, it doesn't magically go away because we drop the group file > > descriptor. > > I'm not saying it does, I'm saying it makes the uAPI more regular and > easier to fit into /dev/ioasid without the group FD. > > > It only makes the uapi more difficult to use correctly because > > userspace drivers need to go outside of the uapi to have any idea > > that this restriction exists. > > I don't think it makes any substantive difference one way or the > other. > > With the group FD: the userspace has to read sysfs, find the list of > devices in the group, open the group fd, create device FDs for each > device using the name from sysfs. > > Starting from a BDF the general pseudo code is > group_path = readlink("/sys/bus/pci/devices/BDF/iommu_group") > group_name = basename(group_path) > group_fd = open("/dev/vfio/"+group_name) > device_fd = ioctl(VFIO_GROUP_GET_DEVICE_FD, BDF); > > Without the group FD: the userspace has to read sysfs, find the list > of devices in the group and then open the device-specific cdev (found > via sysfs) and link them to a /dev/ioasid FD. > > Starting from a BDF the general pseudo code is: > device_name = first_directory_of("/sys/bus/pci/devices/BDF/vfio/") > device_fd = open("/dev/vfio/"+device_name) > ioasidfd = open("/dev/ioasid") > ioctl(device_fd, JOIN_IOASID_FD, ioasidfd) This line is the problem. [Historical aside: Alex's early drafts for the VFIO interface looked quite similar to this. Ben Herrenschmidt and myself persuaded him it was a bad idea, and groups were developed instead. I still think it's a bad idea, and not just for POWER] As Alex says, if this line fails because of the group restrictions, that's not great because it's not very obvious what's gone wrong. But IMO, the success path on a multi-device group is kind of worse: you've now made made a meaningful and visible change to the setup of devices which are not mentioned in this line *at all*. If you've changed the DMA address space of this device you've also changed it for everything else in the group - there's no getting around that. For both those reasons, I absolutely agree with Alex that retaining the explicit group model is valuable. Yes, it makes set up more of a pain, but it's necessary complexity to actually understand what's going on here. > These two routes can have identical outcomes and identical security > checks. > > In both cases if userspace wants a list of BDFs in the same group as > the BDF it is interested in: > readdir("/sys/bus/pci/devices/BDF/iommu_group/devices") > > It seems like a very small difference to me. > > I still don't see how the group restriction gets surfaced to the > application through the group FD. The applications I looked through > just treat the group FD as a step on their way to get the device_fd. > > Jason > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson