Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs

From: Alex Williamson <alex.williamson@redhat.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: "Liu, Yi L" <yi.l.liu@intel.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"jacob.jun.pan@linux.intel.com" <jacob.jun.pan@linux.intel.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Tian, Jun J" <jun.j.tian@intel.com>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
	"peterx@redhat.com" <peterx@redhat.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"Wu, Hao" <hao.wu@intel.com>
Subject: Re: [PATCH v1 2/2] vfio/pci: Emulate PASID/PRI capability for VFs
Date: Tue, 7 Apr 2020 09:58:01 -0600	[thread overview]
Message-ID: <20200407095801.648b1371@w520.home> (raw)
In-Reply-To: <AADFC41AFE54684AB9EE6CBC0274A5D19D80E13D@SHSMSX104.ccr.corp.intel.com>

On Tue, 7 Apr 2020 04:26:23 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Saturday, April 4, 2020 1:26 AM  
> [...]
> > > > > +	if (!pasid_cap.control_reg.paside) {
> > > > > +		pr_debug("%s: its PF's PASID capability is not enabled\n",
> > > > > +			dev_name(&vdev->pdev->dev));
> > > > > +		ret = 0;
> > > > > +		goto out;
> > > > > +	}  
> > > >
> > > > What happens if the PF's PASID gets disabled while we're using it??  
> > >
> > > This is actually the open I highlighted in cover letter. Per the reply
> > > from Baolu, this seems to be an open for bare-metal all the same.
> > > https://lkml.org/lkml/2020/3/31/95  
> > 
> > Seems that needs to get sorted out before we can expose this.  Maybe
> > some sort of registration with the PF driver that PASID is being used
> > by a VF so it cannot be disabled?  
> 
> I guess we may do vSVA for PF first, and then adding VF vSVA later
> given above additional need. It's not necessarily to enable both
> in one step.
> 
> [...]
> > > > > @@ -1604,6 +1901,18 @@ static int vfio_ecap_init(struct  
> > vfio_pci_device *vdev)  
> > > > >  	if (!ecaps)
> > > > >  		*(u32 *)&vdev->vconfig[PCI_CFG_SPACE_SIZE] = 0;
> > > > >
> > > > > +#ifdef CONFIG_PCI_ATS
> > > > > +	if (pdev->is_virtfn) {
> > > > > +		struct pci_dev *physfn = pdev->physfn;
> > > > > +
> > > > > +		ret = vfio_pci_add_emulated_cap_for_vf(vdev,
> > > > > +					physfn, epos_max, prev);
> > > > > +		if (ret)
> > > > > +			pr_info("%s, failed to add special caps for VF %s\n",
> > > > > +				__func__, dev_name(&vdev->pdev->dev));
> > > > > +	}
> > > > > +#endif  
> > > >
> > > > I can only imagine that we should place the caps at the same location
> > > > they exist on the PF, we don't know what hidden registers might be
> > > > hiding in config space.  
> 
> Is there vendor guarantee that hidden registers will locate at the
> same offset between PF and VF config space? 

I'm not sure if the spec really precludes hidden registers, but the
fact that these registers are explicitly outside of the capability
chain implies they're only intended for device specific use, so I'd say
there are no guarantees about anything related to these registers.

FWIW, vfio started out being more strict about restricting config space
access to defined capabilities, until...

commit a7d1ea1c11b33bda2691f3294b4d735ed635535a
Author: Alex Williamson <alex.williamson@redhat.com>
Date:   Mon Apr 1 09:04:12 2013 -0600

    vfio-pci: Enable raw access to unassigned config space

    Devices like be2net hide registers between the gaps in capabilities
    and architected regions of PCI config space.  Our choices to support
    such devices is to either build an ever growing and unmanageable white
    list or rely on hardware isolation to protect us.  These registers are
    really no different than MMIO or I/O port space registers, which we
    don't attempt to regulate, so treat PCI config space in the same way.

> > > but we are not sure whether the same location is available on VF. In
> > > this patch, it actually places the emulated cap physically behind the
> > > cap which lays farthest (its offset is largest) within VF's config space
> > > as the PCIe caps are linked in a chain.  
> > 
> > But, as we've found on Broadcom NICs (iirc), hardware developers have a
> > nasty habit of hiding random registers in PCI config space, outside of
> > defined capabilities.  I feel like IGD might even do this too, is that
> > true?  So I don't think we can guarantee that just because a section of
> > config space isn't part of a defined capability that its unused.  It
> > only means that it's unused by common code, but it might have device
> > specific purposes.  So of the PCIe spec indicates that VFs cannot
> > include these capabilities and virtialization software needs to
> > emulate them, we need somewhere safe to place them in config space, and
> > simply placing them off the end of known capabilities doesn't give me
> > any confidence.  Also, hardware has no requirement to make compact use
> > of extended config space.  The first capability must be at 0x100, the
> > very next capability could consume all the way to the last byte of the
> > 4K extended range, and the next link in the chain could be somewhere in
> > the middle.  Thanks,
> >   
> 
> Then what would be a viable option? Vendor nasty habit implies
> no standard, thus I don't see how VFIO can find a safe location
> by itself. Also curious how those hidden registers are identified
> by VFIO and employed with proper r/w policy today. If sort of quirks
> are used, then could such quirk way be extended to also carry
> the information about vendor specific safe location? When no
> such quirk info is provided (the majority case), VFIO then finds
> out a free location to carry the new cap.

See above commit, rather than quirks we allow raw access to any config
space outside of the capability chain.  My preference for trying to
place virtual capabilities at the same offset as the capability exists
on the PF is my impression that the PF config space is often a template
for the VF config space.  The PF and VF are clearly not independent
devices, they share design aspects, and sometimes drivers.  Therefore
if I was a lazy engineer trying to find a place to hide a register in
config space (and ignoring vendor capabilities*), I'd probably put it
in the same place on both devices.  Thus if we maintain the same
capability footprint as the PF, we have a better chance of avoiding
them.  It's a gamble and maybe we're overthinking it, but this has
always been a concern when adding virtual capabilities to a physical
device.  We can always fail over to an approach where we simply find
free space.  Thanks,

Alex

* ISTR the Broadcom device implemented the hidden register in standard
  config space, which was otherwise entirely packed, ie. there was no
  room for the register to be implemented as a vendor cap.