All of lore.kernel.org
 help / color / mirror / Atom feed
* vPASID capability for VF
@ 2023-05-09  8:34 Tian, Kevin
  2023-05-09 22:43 ` Jason Gunthorpe
  2023-05-10 17:24 ` Alex Williamson
  0 siblings, 2 replies; 21+ messages in thread
From: Tian, Kevin @ 2023-05-09  8:34 UTC (permalink / raw)
  To: Alex Williamson, Jason Gunthorpe
  Cc: kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

According to PCIe spec (7.8.9 PASID Extended Capability Structure):

  The PASID configuration of the single non-VF Function representing
  the device is also used by all VFs in the device. A PF is permitted
  to implement the PASID capability, but VFs must not implement it.

To enable PASID on VF then one open is where to locate the PASID
capability in VF's vconfig space. vfio-pci doesn't know which offset
may contain VF specific config registers. Finding such offset must
come from a device specific knowledge.

iirc we haven't closed this open in previous discussion. It's probably
right time to revisit it and agree on a solution now.

Three options on the table:

1) Use vfio-pci variant driver. Easy to support especially when the VF
   also plans to support live migration. is it overburdened otherwise?

2) Have vfio-pci maintain a table tracking hard-coded offset per
   vendor/device ID.

3) Extend PCI core to allow PF driver specifying the offset for VF

Which one is cleanest in your view?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-09  8:34 vPASID capability for VF Tian, Kevin
@ 2023-05-09 22:43 ` Jason Gunthorpe
  2023-05-09 22:57   ` Tian, Kevin
  2023-05-10 17:24 ` Alex Williamson
  1 sibling, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2023-05-09 22:43 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alex Williamson, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

On Tue, May 09, 2023 at 08:34:53AM +0000, Tian, Kevin wrote:
> According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> 
>   The PASID configuration of the single non-VF Function representing
>   the device is also used by all VFs in the device. A PF is permitted
>   to implement the PASID capability, but VFs must not implement it.
> 
> To enable PASID on VF then one open is where to locate the PASID
> capability in VF's vconfig space. vfio-pci doesn't know which offset
> may contain VF specific config registers. Finding such offset must
> come from a device specific knowledge.

Why? Can't vfio probe the cap tree and just find a gap to insert a new
cap? We already mangle the cap list, I'm not sure I see what
the problem is?

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-09 22:43 ` Jason Gunthorpe
@ 2023-05-09 22:57   ` Tian, Kevin
  2023-05-09 23:13     ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2023-05-09 22:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, May 10, 2023 6:44 AM
> 
> On Tue, May 09, 2023 at 08:34:53AM +0000, Tian, Kevin wrote:
> > According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> >
> >   The PASID configuration of the single non-VF Function representing
> >   the device is also used by all VFs in the device. A PF is permitted
> >   to implement the PASID capability, but VFs must not implement it.
> >
> > To enable PASID on VF then one open is where to locate the PASID
> > capability in VF's vconfig space. vfio-pci doesn't know which offset
> > may contain VF specific config registers. Finding such offset must
> > come from a device specific knowledge.
> 
> Why? Can't vfio probe the cap tree and just find a gap to insert a new
> cap? We already mangle the cap list, I'm not sure I see what
> the problem is?
> 

PCI config space includes not only caps, but also device specific
defined fields. e.g. Intel IGD defines offset 0xfc as a pointer to
OpRegion. I'm sure Alex can give many other examples.

So it's easy to find the gap between caps, but not easy to know
whether that gap is actually free to use.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-09 22:57   ` Tian, Kevin
@ 2023-05-09 23:13     ` Jason Gunthorpe
  2023-05-09 23:41       ` Tian, Kevin
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2023-05-09 23:13 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alex Williamson, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

On Tue, May 09, 2023 at 10:57:04PM +0000, Tian, Kevin wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, May 10, 2023 6:44 AM
> > 
> > On Tue, May 09, 2023 at 08:34:53AM +0000, Tian, Kevin wrote:
> > > According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> > >
> > >   The PASID configuration of the single non-VF Function representing
> > >   the device is also used by all VFs in the device. A PF is permitted
> > >   to implement the PASID capability, but VFs must not implement it.
> > >
> > > To enable PASID on VF then one open is where to locate the PASID
> > > capability in VF's vconfig space. vfio-pci doesn't know which offset
> > > may contain VF specific config registers. Finding such offset must
> > > come from a device specific knowledge.
> > 
> > Why? Can't vfio probe the cap tree and just find a gap to insert a new
> > cap? We already mangle the cap list, I'm not sure I see what
> > the problem is?
> > 
> 
> PCI config space includes not only caps, but also device specific
> defined fields. e.g. Intel IGD defines offset 0xfc as a pointer to
> OpRegion. I'm sure Alex can give many other examples.

Do we even expose those over VIFO? I thought in general we blocked of
various parts of the config space. I keep seeing patches to unblock
parts of config space?

I'd do the reverse and say devices that want to pass parts of their
config space should have a special hook to do it and otherwise we
should sanitize and block?

eg we already have a hook to pass the opregion

> So it's easy to find the gap between caps, but not easy to know
> whether that gap is actually free to use.

Because, let's face it, this is a horrible thing to do, and the
opregion stuff is just ugly as s sin.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-09 23:13     ` Jason Gunthorpe
@ 2023-05-09 23:41       ` Tian, Kevin
  2023-05-10  0:31         ` Alex Williamson
  0 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2023-05-09 23:41 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, May 10, 2023 7:13 AM
> 
> On Tue, May 09, 2023 at 10:57:04PM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > Sent: Wednesday, May 10, 2023 6:44 AM
> > >
> > > On Tue, May 09, 2023 at 08:34:53AM +0000, Tian, Kevin wrote:
> > > > According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> > > >
> > > >   The PASID configuration of the single non-VF Function representing
> > > >   the device is also used by all VFs in the device. A PF is permitted
> > > >   to implement the PASID capability, but VFs must not implement it.
> > > >
> > > > To enable PASID on VF then one open is where to locate the PASID
> > > > capability in VF's vconfig space. vfio-pci doesn't know which offset
> > > > may contain VF specific config registers. Finding such offset must
> > > > come from a device specific knowledge.
> > >
> > > Why? Can't vfio probe the cap tree and just find a gap to insert a new
> > > cap? We already mangle the cap list, I'm not sure I see what
> > > the problem is?
> > >
> >
> > PCI config space includes not only caps, but also device specific
> > defined fields. e.g. Intel IGD defines offset 0xfc as a pointer to
> > OpRegion. I'm sure Alex can give many other examples.
> 
> Do we even expose those over VIFO? I thought in general we blocked of

Yes. I did a quick check:

/*
 * Default unassigned regions to raw read-write access.  Some devices
 * require this to function as they hide registers between the gaps in
 * config space (be2net).  Like MMIO and I/O port registers, we have
 * to trust the hardware isolation.
 */
static struct perm_bits unassigned_perms = {
	.readfn = vfio_raw_config_read,
	.writefn = vfio_raw_config_write
};

vfio_config_do_rw()
{
	...
	if (cap_id == PCI_CAP_ID_INVALID) {
		perm = &unassigned_perms;
		cap_start = *ppos;
	} ...
}

vfio_config_init()
{
	...
	memset(map, PCI_CAP_ID_BASIC, PCI_STD_HEADER_SIZEOF);
	memset(map + PCI_STD_HEADER_SIZEOF, PCI_CAP_ID_INVALID,
              		 pdev->cfg_size - PCI_STD_HEADER_SIZEOF);
	...
}

> various parts of the config space. I keep seeing patches to unblock
> parts of config space?
> 
> I'd do the reverse and say devices that want to pass parts of their
> config space should have a special hook to do it and otherwise we
> should sanitize and block?

This then may break backward compatibility. We don't know how
many devices have such hidden registers so if anyone misses a hook
immediately it cannot be assigned after we start blocking as default.

> 
> eg we already have a hook to pass the opregion
> 
> > So it's easy to find the gap between caps, but not easy to know
> > whether that gap is actually free to use.
> 
> Because, let's face it, this is a horrible thing to do, and the
> opregion stuff is just ugly as s sin.
> 

It's ugly, but that is the reality. :/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-09 23:41       ` Tian, Kevin
@ 2023-05-10  0:31         ` Alex Williamson
  2023-05-10  0:59           ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Alex Williamson @ 2023-05-10  0:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Tian, Kevin, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

On Tue, 9 May 2023 23:41:44 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Jason Gunthorpe <jgg@nvidia.com>
> > Sent: Wednesday, May 10, 2023 7:13 AM
> > 
> > On Tue, May 09, 2023 at 10:57:04PM +0000, Tian, Kevin wrote:  
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > Sent: Wednesday, May 10, 2023 6:44 AM
> > > >
> > > > On Tue, May 09, 2023 at 08:34:53AM +0000, Tian, Kevin wrote:  
> > > > > According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> > > > >
> > > > >   The PASID configuration of the single non-VF Function representing
> > > > >   the device is also used by all VFs in the device. A PF is permitted
> > > > >   to implement the PASID capability, but VFs must not implement it.
> > > > >
> > > > > To enable PASID on VF then one open is where to locate the PASID
> > > > > capability in VF's vconfig space. vfio-pci doesn't know which offset
> > > > > may contain VF specific config registers. Finding such offset must
> > > > > come from a device specific knowledge.  
> > > >
> > > > Why? Can't vfio probe the cap tree and just find a gap to insert a new
> > > > cap? We already mangle the cap list, I'm not sure I see what
> > > > the problem is?
> > > >  
> > >
> > > PCI config space includes not only caps, but also device specific
> > > defined fields. e.g. Intel IGD defines offset 0xfc as a pointer to
> > > OpRegion. I'm sure Alex can give many other examples.  
> > 
> > Do we even expose those over VIFO? I thought in general we blocked of  
> 
> Yes. I did a quick check:
> 
> /*
>  * Default unassigned regions to raw read-write access.  Some devices
>  * require this to function as they hide registers between the gaps in
>  * config space (be2net).  Like MMIO and I/O port registers, we have
>  * to trust the hardware isolation.
>  */
> static struct perm_bits unassigned_perms = {
> 	.readfn = vfio_raw_config_read,
> 	.writefn = vfio_raw_config_write
> };
> 
> vfio_config_do_rw()
> {
> 	...
> 	if (cap_id == PCI_CAP_ID_INVALID) {
> 		perm = &unassigned_perms;
> 		cap_start = *ppos;
> 	} ...
> }
> 
> vfio_config_init()
> {
> 	...
> 	memset(map, PCI_CAP_ID_BASIC, PCI_STD_HEADER_SIZEOF);
> 	memset(map + PCI_STD_HEADER_SIZEOF, PCI_CAP_ID_INVALID,
>               		 pdev->cfg_size - PCI_STD_HEADER_SIZEOF);
> 	...
> }
> 
> > various parts of the config space. I keep seeing patches to unblock
> > parts of config space?
> > 
> > I'd do the reverse and say devices that want to pass parts of their
> > config space should have a special hook to do it and otherwise we
> > should sanitize and block?  
> 
> This then may break backward compatibility. We don't know how
> many devices have such hidden registers so if anyone misses a hook
> immediately it cannot be assigned after we start blocking as default.
> 
> > 
> > eg we already have a hook to pass the opregion
> >   
> > > So it's easy to find the gap between caps, but not easy to know
> > > whether that gap is actually free to use.  
> > 
> > Because, let's face it, this is a horrible thing to do, and the
> > opregion stuff is just ugly as s sin.
> >   
> 
> It's ugly, but that is the reality. :/

Have a peak at the config space of an NVIDIA GPU and tell me which of
those non-zero fields between capabilities are used as well.  Glass
houses...  ;-)

IIRC we originally needed to enable this for a Broadcom NIC that
stuffed device specific registers in un-architected config space. The
capabilities we're {un}hiding are architected things that we know are
unsupported or unsafe, the gaps, just like device specific
capabilities, we're obliged to expose for functionality.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-10  0:31         ` Alex Williamson
@ 2023-05-10  0:59           ` Jason Gunthorpe
  2023-05-10  2:16             ` Tian, Kevin
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2023-05-10  0:59 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

On Tue, May 09, 2023 at 06:31:11PM -0600, Alex Williamson wrote:

> IIRC we originally needed to enable this for a Broadcom NIC that
> stuffed device specific registers in un-architected config space. The
> capabilities we're {un}hiding are architected things that we know are
> unsupported or unsafe, the gaps, just like device specific
> capabilities, we're obliged to expose for functionality.  Thanks,

I still think that if people want to do this they should wrap their
stuff in a dvsec..

If we have no choice but to inject a PASID cap for this to work then I
don't think we should quirk every device, but punish those that don't
use DVSEC/etc

So.. If PASID injection is needed then block the unmanaged space and
add quirks for devices that really need it. Otherwise leave it
alone.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-10  0:59           ` Jason Gunthorpe
@ 2023-05-10  2:16             ` Tian, Kevin
  2023-05-10 20:39               ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2023-05-10  2:16 UTC (permalink / raw)
  To: Jason Gunthorpe, Alex Williamson
  Cc: kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Wednesday, May 10, 2023 8:59 AM
> 
> On Tue, May 09, 2023 at 06:31:11PM -0600, Alex Williamson wrote:
> 
> > IIRC we originally needed to enable this for a Broadcom NIC that
> > stuffed device specific registers in un-architected config space. The
> > capabilities we're {un}hiding are architected things that we know are
> > unsupported or unsafe, the gaps, just like device specific
> > capabilities, we're obliged to expose for functionality.  Thanks,
> 
> I still think that if people want to do this they should wrap their
> stuff in a dvsec..
> 
> If we have no choice but to inject a PASID cap for this to work then I
> don't think we should quirk every device, but punish those that don't
> use DVSEC/etc
> 
> So.. If PASID injection is needed then block the unmanaged space and
> add quirks for devices that really need it. Otherwise leave it
> alone.
> 

We don't have a control knob to hide/unhide a specific PCI cap
today. It's hardcoded with proper virtualization policy in vfio-pci.

Following current convention once vfio-pci adds the support for the
PASID cap it will be exposed if present (for VF it's the presence in PF).

So I wonder how "if PASID injection is needed" can be decided.

If we pick the policy of default blocking the unmanaged space when
the PASID cap is exposed I'm not sure whether it will break any
existing device which already has the PASID cap and hidden registers
in the unmanaged space and can be assigned successfully today. That
legacy usage just doesn't need the PASID cap.

That uncertainty leans me toward quirking every device to use PASID,
though I dislike it.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-09  8:34 vPASID capability for VF Tian, Kevin
  2023-05-09 22:43 ` Jason Gunthorpe
@ 2023-05-10 17:24 ` Alex Williamson
  2023-05-10 20:15   ` Jason Gunthorpe
  2023-05-11  7:27   ` Tian, Kevin
  1 sibling, 2 replies; 21+ messages in thread
From: Alex Williamson @ 2023-05-10 17:24 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

On Tue, 9 May 2023 08:34:53 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> 
>   The PASID configuration of the single non-VF Function representing
>   the device is also used by all VFs in the device. A PF is permitted
>   to implement the PASID capability, but VFs must not implement it.
> 
> To enable PASID on VF then one open is where to locate the PASID
> capability in VF's vconfig space. vfio-pci doesn't know which offset
> may contain VF specific config registers. Finding such offset must
> come from a device specific knowledge.

Backup for a moment, VFs are governed by the PASID capability on the
PF.  The PASID capability exposes control registers that imply the
ability to manage various feature enable bits.  The VF owner does not
have privileges to manipulate those bits.  For example, the PASID Enable
bit should restrict the endpoint from sending TLPs with a PASID prefix,
but this can only be changed at the PF level for all associated VFs.

The protocol specified in 7.8.9.3 defines this enable bit as RW.  How do
we virtualize that?  Either it's virtualized to be read-only and we
violate the spec or we allow it to be read-write and it has no effect,
which violates the spec.

Is this capability really intended to be mirrored by software to the
VFs or do we need to expose the PASID capabilities of the VF in some
other way?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-10 17:24 ` Alex Williamson
@ 2023-05-10 20:15   ` Jason Gunthorpe
  2023-05-11  7:27   ` Tian, Kevin
  1 sibling, 0 replies; 21+ messages in thread
From: Jason Gunthorpe @ 2023-05-10 20:15 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Tian, Kevin, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

On Wed, May 10, 2023 at 11:24:49AM -0600, Alex Williamson wrote:

> Is this capability really intended to be mirrored by software to the
> VFs or do we need to expose the PASID capabilities of the VF in some
> other way?  Thanks,

The VM definately needs this information, but not really the control
parts..

If we don't put it in the config space then we need to feed it through
ACPI which would solve both problems I suppose.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-10  2:16             ` Tian, Kevin
@ 2023-05-10 20:39               ` Jason Gunthorpe
  2023-05-11  7:42                 ` Tian, Kevin
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2023-05-10 20:39 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alex Williamson, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

On Wed, May 10, 2023 at 02:16:05AM +0000, Tian, Kevin wrote:

> We don't have a control knob to hide/unhide a specific PCI cap
> today. It's hardcoded with proper virtualization policy in vfio-pci.
> 
> Following current convention once vfio-pci adds the support for the
> PASID cap it will be exposed if present (for VF it's the presence in PF).

We probably shouldn't do this - the PASID cap should only exist if the
VMM is actualy able to handle PASID throughout, and currently no VMM
does this.

So we can't just have the kernel unconditionally add the cap. There
needs to be a negotiation with the VMM

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-10 17:24 ` Alex Williamson
  2023-05-10 20:15   ` Jason Gunthorpe
@ 2023-05-11  7:27   ` Tian, Kevin
  2023-05-11 11:34     ` Baolu Lu
  2023-05-11 15:45     ` Alex Williamson
  1 sibling, 2 replies; 21+ messages in thread
From: Tian, Kevin @ 2023-05-11  7:27 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jason Gunthorpe, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, May 11, 2023 1:25 AM
> 
> On Tue, 9 May 2023 08:34:53 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> >
> >   The PASID configuration of the single non-VF Function representing
> >   the device is also used by all VFs in the device. A PF is permitted
> >   to implement the PASID capability, but VFs must not implement it.
> >
> > To enable PASID on VF then one open is where to locate the PASID
> > capability in VF's vconfig space. vfio-pci doesn't know which offset
> > may contain VF specific config registers. Finding such offset must
> > come from a device specific knowledge.
> 
> Backup for a moment, VFs are governed by the PASID capability on the
> PF.  The PASID capability exposes control registers that imply the
> ability to manage various feature enable bits.  The VF owner does not
> have privileges to manipulate those bits.  For example, the PASID Enable
> bit should restrict the endpoint from sending TLPs with a PASID prefix,
> but this can only be changed at the PF level for all associated VFs.
> 
> The protocol specified in 7.8.9.3 defines this enable bit as RW.  How do
> we virtualize that?  Either it's virtualized to be read-only and we
> violate the spec or we allow it to be read-write and it has no effect,
> which violates the spec.
> 

Currently the PASID cap is enabled by default when a device is probed
by iommu driver. Leaving it enabled in PF while guest wants it disabled
in VF is harmless. W/o proper setup in iommu side the VF cannot
do real work with PASID.

From this angle fully virtualizing it in software looks good to me.

In another thread it's suggested that enabling the PASID cap should be
opted in by device driver instead of by iommu driver.

If that happens then vfio-pci may want to call into the PF driver
when the vPASID cap is enabled in VF. If the physical PASID cap in PF
hasn't been enabled then enable it. The PF driver will track which VF's
or its own clients require the PASID cap and keep it enabled until
no one wants it.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-10 20:39               ` Jason Gunthorpe
@ 2023-05-11  7:42                 ` Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2023-05-11  7:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, May 11, 2023 4:39 AM
> 
> On Wed, May 10, 2023 at 02:16:05AM +0000, Tian, Kevin wrote:
> 
> > We don't have a control knob to hide/unhide a specific PCI cap
> > today. It's hardcoded with proper virtualization policy in vfio-pci.
> >
> > Following current convention once vfio-pci adds the support for the
> > PASID cap it will be exposed if present (for VF it's the presence in PF).
> 
> We probably shouldn't do this - the PASID cap should only exist if the
> VMM is actualy able to handle PASID throughout, and currently no VMM
> does this.
> 
> So we can't just have the kernel unconditionally add the cap. There
> needs to be a negotiation with the VMM
> 

emmm. if that is the case probably we want to convey the cap to
userspace in a separate interface. I don't think it's a good idea to
give the user inconsistent vconfig layout before and after the user
negotiates.

Probably a device feature? The VMM calls device feature ioctl to
query whether PASID is supported (together with the pasid bits)
and to enable/disable it.

This also allows the user to opt whether it wants to manage PASIDs
itself or go to a simple scheme to get them from the kernel.

The VMM is responsible for finding an offset in vconfig space, e.g.
adopting your suggestion to find a gap between caps and block hidden
registers on a device if vPASID is favored and add quirks otherwise.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-11  7:27   ` Tian, Kevin
@ 2023-05-11 11:34     ` Baolu Lu
  2023-05-12  2:59       ` Tian, Kevin
  2023-05-11 15:45     ` Alex Williamson
  1 sibling, 1 reply; 21+ messages in thread
From: Baolu Lu @ 2023-05-11 11:34 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: baolu.lu, Jason Gunthorpe, kvm, linux-kernel, Liu, Yi L,
	Nicolin Chen, Lu, Baolu

On 5/11/23 3:27 PM, Tian, Kevin wrote:
>> From: Alex Williamson<alex.williamson@redhat.com>
>> Sent: Thursday, May 11, 2023 1:25 AM
>>
>> On Tue, 9 May 2023 08:34:53 +0000
>> "Tian, Kevin"<kevin.tian@intel.com>  wrote:
>>
>>> According to PCIe spec (7.8.9 PASID Extended Capability Structure):
>>>
>>>    The PASID configuration of the single non-VF Function representing
>>>    the device is also used by all VFs in the device. A PF is permitted
>>>    to implement the PASID capability, but VFs must not implement it.
>>>
>>> To enable PASID on VF then one open is where to locate the PASID
>>> capability in VF's vconfig space. vfio-pci doesn't know which offset
>>> may contain VF specific config registers. Finding such offset must
>>> come from a device specific knowledge.
>> Backup for a moment, VFs are governed by the PASID capability on the
>> PF.  The PASID capability exposes control registers that imply the
>> ability to manage various feature enable bits.  The VF owner does not
>> have privileges to manipulate those bits.  For example, the PASID Enable
>> bit should restrict the endpoint from sending TLPs with a PASID prefix,
>> but this can only be changed at the PF level for all associated VFs.
>>
>> The protocol specified in 7.8.9.3 defines this enable bit as RW.  How do
>> we virtualize that?  Either it's virtualized to be read-only and we
>> violate the spec or we allow it to be read-write and it has no effect,
>> which violates the spec.
>>
> Currently the PASID cap is enabled by default when a device is probed
> by iommu driver. Leaving it enabled in PF while guest wants it disabled
> in VF is harmless. W/o proper setup in iommu side the VF cannot
> do real work with PASID.

[sorry for partial reply]

I am attempting to move PASID enabling/disabling out of the iommu
driver and give its control to the device driver. One puzzle thing is
that PCI spec requires PASID control bits not changed once the ATS is
is enabled. So I also need to add iommu interfaces to enable/disable
ATS on devices.

By default, the ATS is enabled by the iommu subsystem to utilize the
device TLB, the device driver needs to disable it before change the
PASID control bits and re-enable it afterwards.

I have patches in my local tree, if the pasid virtualization requires
them, I can bring it up for discussion.

Best regards,
baolu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-11  7:27   ` Tian, Kevin
  2023-05-11 11:34     ` Baolu Lu
@ 2023-05-11 15:45     ` Alex Williamson
  2023-05-12  2:52       ` Tian, Kevin
                         ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Alex Williamson @ 2023-05-11 15:45 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Jason Gunthorpe, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

On Thu, 11 May 2023 07:27:27 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Thursday, May 11, 2023 1:25 AM
> > 
> > On Tue, 9 May 2023 08:34:53 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> > >
> > >   The PASID configuration of the single non-VF Function representing
> > >   the device is also used by all VFs in the device. A PF is permitted
> > >   to implement the PASID capability, but VFs must not implement it.
> > >
> > > To enable PASID on VF then one open is where to locate the PASID
> > > capability in VF's vconfig space. vfio-pci doesn't know which offset
> > > may contain VF specific config registers. Finding such offset must
> > > come from a device specific knowledge.  
> > 
> > Backup for a moment, VFs are governed by the PASID capability on the
> > PF.  The PASID capability exposes control registers that imply the
> > ability to manage various feature enable bits.  The VF owner does not
> > have privileges to manipulate those bits.  For example, the PASID Enable
> > bit should restrict the endpoint from sending TLPs with a PASID prefix,
> > but this can only be changed at the PF level for all associated VFs.
> > 
> > The protocol specified in 7.8.9.3 defines this enable bit as RW.  How do
> > we virtualize that?  Either it's virtualized to be read-only and we
> > violate the spec or we allow it to be read-write and it has no effect,
> > which violates the spec.
> >   
> 
> Currently the PASID cap is enabled by default when a device is probed
> by iommu driver. Leaving it enabled in PF while guest wants it disabled
> in VF is harmless. W/o proper setup in iommu side the VF cannot
> do real work with PASID.
> 
> From this angle fully virtualizing it in software looks good to me.

So you're suggesting that the IOMMU setup for the VF to make use of
PASID would not occur until or unless PASID Enable is set in the
virtualized VF PASID capability and that support would be torn down
when PASID Enable is cleared?

This is still not strictly in adherence with the definition of the
PASID Enable bit which specifies that this bit controls whether the
endpoint is able to send or receive TLPs with the PASID prefix, which
clearly virtualization interacting with the IOMMU to block or allow
PASIDs from the VF RID cannot change.  Is it sufficient?

For example we can't use the vPASID capability to make any guarantees
about in-flight PASID TLPs when sequencing IOMMU operations since we
can't actually prevent VFs using PASID so long as PASID Enable is set
on the PF.

> In another thread it's suggested that enabling the PASID cap should be
> opted in by device driver instead of by iommu driver.
> 
> If that happens then vfio-pci may want to call into the PF driver
> when the vPASID cap is enabled in VF. If the physical PASID cap in PF
> hasn't been enabled then enable it. The PF driver will track which VF's
> or its own clients require the PASID cap and keep it enabled until
> no one wants it.

Why wouldn't we just require PASID support to be enabled and remain
enabled on the PF or else we don't expose the virtualized PASID
capability on the VF?

I don't particularly like the idea of allowing userspace VF drivers to
vote on the PASID Enable state of the PF driver.  The PF driver should
be able to control the policy whether PASID support is enabled,
especially given the interaction with ATS, as noted by Baolu.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-11 15:45     ` Alex Williamson
@ 2023-05-12  2:52       ` Tian, Kevin
  2023-05-17  5:22       ` Tian, Kevin
  2023-07-27  1:17       ` Tian, Kevin
  2 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2023-05-12  2:52 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jason Gunthorpe, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

> From: Alex Williamson <alex.williamson@redhat.com>
> Sent: Thursday, May 11, 2023 11:45 PM
> 
> On Thu, 11 May 2023 07:27:27 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Thursday, May 11, 2023 1:25 AM
> > >
> > > On Tue, 9 May 2023 08:34:53 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> > > >
> > > >   The PASID configuration of the single non-VF Function representing
> > > >   the device is also used by all VFs in the device. A PF is permitted
> > > >   to implement the PASID capability, but VFs must not implement it.
> > > >
> > > > To enable PASID on VF then one open is where to locate the PASID
> > > > capability in VF's vconfig space. vfio-pci doesn't know which offset
> > > > may contain VF specific config registers. Finding such offset must
> > > > come from a device specific knowledge.
> > >
> > > Backup for a moment, VFs are governed by the PASID capability on the
> > > PF.  The PASID capability exposes control registers that imply the
> > > ability to manage various feature enable bits.  The VF owner does not
> > > have privileges to manipulate those bits.  For example, the PASID Enable
> > > bit should restrict the endpoint from sending TLPs with a PASID prefix,
> > > but this can only be changed at the PF level for all associated VFs.
> > >
> > > The protocol specified in 7.8.9.3 defines this enable bit as RW.  How do
> > > we virtualize that?  Either it's virtualized to be read-only and we
> > > violate the spec or we allow it to be read-write and it has no effect,
> > > which violates the spec.
> > >
> >
> > Currently the PASID cap is enabled by default when a device is probed
> > by iommu driver. Leaving it enabled in PF while guest wants it disabled
> > in VF is harmless. W/o proper setup in iommu side the VF cannot
> > do real work with PASID.
> >
> > From this angle fully virtualizing it in software looks good to me.
> 
> So you're suggesting that the IOMMU setup for the VF to make use of
> PASID would not occur until or unless PASID Enable is set in the
> virtualized VF PASID capability and that support would be torn down
> when PASID Enable is cleared?

No that is not the case. The IOMMU setup is initiated by vIOMMU
and orthogonal to the PASID cap virtualization.

Following the current IOMMU behavior as Baolu described the guest
will always enable vPASID in the vIOMMU driver.

Even if the guest implements an driver-opt model for vPASID enabling,
in typical case the guest driver will not request vIOMMU setup for VF
PASIDs if it doesn't intend to enable vPASID cap. In this case the
physical IOMMU is left blocking VF PASIDs hence leaving PF PASID enabled
doesn't hurt.

If an insane guest driver does try to enable vIOMMU PASID (so VF PASIDs
are allowed in physical IOMMU) while leaving vPASID disabled in VF,
I'm not sure what would be the actual problem leaving PF PASID enabled.
The guest driver kind of wants to fool itself by already setting up the
fabric to allow VF PASID but then block PASID in VF itself?

> 
> This is still not strictly in adherence with the definition of the
> PASID Enable bit which specifies that this bit controls whether the
> endpoint is able to send or receive TLPs with the PASID prefix, which
> clearly virtualization interacting with the IOMMU to block or allow
> PASIDs from the VF RID cannot change.  Is it sufficient?
> 
> For example we can't use the vPASID capability to make any guarantees
> about in-flight PASID TLPs when sequencing IOMMU operations since we
> can't actually prevent VFs using PASID so long as PASID Enable is set
> on the PF.

IOMMU cares about in-flight PASID TLPs only when it's unblocked.

If it's already blocked then it doesn't matter whether VF is sending PASID
TLP or not.

btw think about the current situation. Even if vfio-pci doesn't expose
PASID cap today, it's physically enabled by iommu driver already. Then
the guest is already able to program the device to send PASID TLP's.

fully virtualizing vPASID cap just aligns with this fact. 😊

> 
> > In another thread it's suggested that enabling the PASID cap should be
> > opted in by device driver instead of by iommu driver.
> >
> > If that happens then vfio-pci may want to call into the PF driver
> > when the vPASID cap is enabled in VF. If the physical PASID cap in PF
> > hasn't been enabled then enable it. The PF driver will track which VF's
> > or its own clients require the PASID cap and keep it enabled until
> > no one wants it.
> 
> Why wouldn't we just require PASID support to be enabled and remain
> enabled on the PF or else we don't expose the virtualized PASID
> capability on the VF?
> 
> I don't particularly like the idea of allowing userspace VF drivers to
> vote on the PASID Enable state of the PF driver.  The PF driver should
> be able to control the policy whether PASID support is enabled,
> especially given the interaction with ATS, as noted by Baolu.  Thanks,
> 

that's fine. Probably I was over-thinking the world where the PASID
cap is dynamically opted by driver.

e.g. even just talking about PF assignment itself would we want vfio-pci
to keep the PASID cap disabled until the user requests to enable it?

of course it's an invalid open if we think the current policy having PASID
enabled by default in iommu driver continues to work.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-11 11:34     ` Baolu Lu
@ 2023-05-12  2:59       ` Tian, Kevin
  2023-05-12 21:01         ` Jason Gunthorpe
  0 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2023-05-12  2:59 UTC (permalink / raw)
  To: Baolu Lu, Alex Williamson
  Cc: Jason Gunthorpe, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, May 11, 2023 7:34 PM
> 
> On 5/11/23 3:27 PM, Tian, Kevin wrote:
> >> From: Alex Williamson<alex.williamson@redhat.com>
> >> Sent: Thursday, May 11, 2023 1:25 AM
> >>
> >> On Tue, 9 May 2023 08:34:53 +0000
> >> "Tian, Kevin"<kevin.tian@intel.com>  wrote:
> >>
> >>> According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> >>>
> >>>    The PASID configuration of the single non-VF Function representing
> >>>    the device is also used by all VFs in the device. A PF is permitted
> >>>    to implement the PASID capability, but VFs must not implement it.
> >>>
> >>> To enable PASID on VF then one open is where to locate the PASID
> >>> capability in VF's vconfig space. vfio-pci doesn't know which offset
> >>> may contain VF specific config registers. Finding such offset must
> >>> come from a device specific knowledge.
> >> Backup for a moment, VFs are governed by the PASID capability on the
> >> PF.  The PASID capability exposes control registers that imply the
> >> ability to manage various feature enable bits.  The VF owner does not
> >> have privileges to manipulate those bits.  For example, the PASID Enable
> >> bit should restrict the endpoint from sending TLPs with a PASID prefix,
> >> but this can only be changed at the PF level for all associated VFs.
> >>
> >> The protocol specified in 7.8.9.3 defines this enable bit as RW.  How do
> >> we virtualize that?  Either it's virtualized to be read-only and we
> >> violate the spec or we allow it to be read-write and it has no effect,
> >> which violates the spec.
> >>
> > Currently the PASID cap is enabled by default when a device is probed
> > by iommu driver. Leaving it enabled in PF while guest wants it disabled
> > in VF is harmless. W/o proper setup in iommu side the VF cannot
> > do real work with PASID.
> 
> [sorry for partial reply]
> 
> I am attempting to move PASID enabling/disabling out of the iommu
> driver and give its control to the device driver. One puzzle thing is
> that PCI spec requires PASID control bits not changed once the ATS is
> is enabled. So I also need to add iommu interfaces to enable/disable
> ATS on devices.
> 
> By default, the ATS is enabled by the iommu subsystem to utilize the
> device TLB, the device driver needs to disable it before change the
> PASID control bits and re-enable it afterwards.
> 

ATS is also relied on by PRS. Not sure how that will be affected when
ATS is dynamically turned on/off. and PRS is not tied to PASID.

Jason, is it still a strong requirement to have driver-opt model for
pasid enabling? iirc it's first raised in a discussion for an AMD GPU
scenario [1]

[1] https://lore.kernel.org/linux-pci/38a7baf4-9b3b-154b-f764-fa8cfa600858@linux.intel.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: vPASID capability for VF
  2023-05-12  2:59       ` Tian, Kevin
@ 2023-05-12 21:01         ` Jason Gunthorpe
  2023-05-17  5:09           ` Tian, Kevin
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Gunthorpe @ 2023-05-12 21:01 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Baolu Lu, Alex Williamson, kvm, linux-kernel, Liu, Yi L,
	Nicolin Chen, Lu, Baolu

On Fri, May 12, 2023 at 02:59:40AM +0000, Tian, Kevin wrote:
> > From: Baolu Lu <baolu.lu@linux.intel.com>
> > Sent: Thursday, May 11, 2023 7:34 PM
> > 
> > On 5/11/23 3:27 PM, Tian, Kevin wrote:
> > >> From: Alex Williamson<alex.williamson@redhat.com>
> > >> Sent: Thursday, May 11, 2023 1:25 AM
> > >>
> > >> On Tue, 9 May 2023 08:34:53 +0000
> > >> "Tian, Kevin"<kevin.tian@intel.com>  wrote:
> > >>
> > >>> According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> > >>>
> > >>>    The PASID configuration of the single non-VF Function representing
> > >>>    the device is also used by all VFs in the device. A PF is permitted
> > >>>    to implement the PASID capability, but VFs must not implement it.
> > >>>
> > >>> To enable PASID on VF then one open is where to locate the PASID
> > >>> capability in VF's vconfig space. vfio-pci doesn't know which offset
> > >>> may contain VF specific config registers. Finding such offset must
> > >>> come from a device specific knowledge.
> > >> Backup for a moment, VFs are governed by the PASID capability on the
> > >> PF.  The PASID capability exposes control registers that imply the
> > >> ability to manage various feature enable bits.  The VF owner does not
> > >> have privileges to manipulate those bits.  For example, the PASID Enable
> > >> bit should restrict the endpoint from sending TLPs with a PASID prefix,
> > >> but this can only be changed at the PF level for all associated VFs.
> > >>
> > >> The protocol specified in 7.8.9.3 defines this enable bit as RW.  How do
> > >> we virtualize that?  Either it's virtualized to be read-only and we
> > >> violate the spec or we allow it to be read-write and it has no effect,
> > >> which violates the spec.
> > >>
> > > Currently the PASID cap is enabled by default when a device is probed
> > > by iommu driver. Leaving it enabled in PF while guest wants it disabled
> > > in VF is harmless. W/o proper setup in iommu side the VF cannot
> > > do real work with PASID.
> > 
> > [sorry for partial reply]
> > 
> > I am attempting to move PASID enabling/disabling out of the iommu
> > driver and give its control to the device driver. One puzzle thing is
> > that PCI spec requires PASID control bits not changed once the ATS is
> > is enabled. So I also need to add iommu interfaces to enable/disable
> > ATS on devices.
> > 
> > By default, the ATS is enabled by the iommu subsystem to utilize the
> > device TLB, the device driver needs to disable it before change the
> > PASID control bits and re-enable it afterwards.
> > 
> 
> ATS is also relied on by PRS. Not sure how that will be affected when
> ATS is dynamically turned on/off. and PRS is not tied to PASID.
> 
> Jason, is it still a strong requirement to have driver-opt model for
> pasid enabling? iirc it's first raised in a discussion for an AMD GPU
> scenario [1]

It is sounding worse and worse as we go along..

AMD and ARM both have the issue that the iommu settings and domain
types depend on if the driver intends to use PASID or not. There is
some small performance win if PASID is not used and the iommu driver
knows it is not used.

We also get into some trouble with groups, I think, where it will be
hard for the iommu driver to know which mode to use when creating a
domain..

But, if the PASID control for a VF is on the PF then I think it is
hopeless. The iommu or PCI layers need to make these decisions and
drivers have to live with it. No PASID support unless the iommu turned
it on.

This still suggests there would be some driver call to the iommu side
to check that PASID is setup.

Jason

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-12 21:01         ` Jason Gunthorpe
@ 2023-05-17  5:09           ` Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2023-05-17  5:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Baolu Lu, Alex Williamson, kvm, linux-kernel, Liu, Yi L,
	Nicolin Chen, Lu, Baolu

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Saturday, May 13, 2023 5:02 AM
> 
> On Fri, May 12, 2023 at 02:59:40AM +0000, Tian, Kevin wrote:
> > > From: Baolu Lu <baolu.lu@linux.intel.com>
> > > Sent: Thursday, May 11, 2023 7:34 PM
> > >
> > > On 5/11/23 3:27 PM, Tian, Kevin wrote:
> > > >> From: Alex Williamson<alex.williamson@redhat.com>
> > > >> Sent: Thursday, May 11, 2023 1:25 AM
> > > >>
> > > >> On Tue, 9 May 2023 08:34:53 +0000
> > > >> "Tian, Kevin"<kevin.tian@intel.com>  wrote:
> > > >>
> > > >>> According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> > > >>>
> > > >>>    The PASID configuration of the single non-VF Function representing
> > > >>>    the device is also used by all VFs in the device. A PF is permitted
> > > >>>    to implement the PASID capability, but VFs must not implement it.
> > > >>>
> > > >>> To enable PASID on VF then one open is where to locate the PASID
> > > >>> capability in VF's vconfig space. vfio-pci doesn't know which offset
> > > >>> may contain VF specific config registers. Finding such offset must
> > > >>> come from a device specific knowledge.
> > > >> Backup for a moment, VFs are governed by the PASID capability on the
> > > >> PF.  The PASID capability exposes control registers that imply the
> > > >> ability to manage various feature enable bits.  The VF owner does not
> > > >> have privileges to manipulate those bits.  For example, the PASID
> Enable
> > > >> bit should restrict the endpoint from sending TLPs with a PASID prefix,
> > > >> but this can only be changed at the PF level for all associated VFs.
> > > >>
> > > >> The protocol specified in 7.8.9.3 defines this enable bit as RW.  How do
> > > >> we virtualize that?  Either it's virtualized to be read-only and we
> > > >> violate the spec or we allow it to be read-write and it has no effect,
> > > >> which violates the spec.
> > > >>
> > > > Currently the PASID cap is enabled by default when a device is probed
> > > > by iommu driver. Leaving it enabled in PF while guest wants it disabled
> > > > in VF is harmless. W/o proper setup in iommu side the VF cannot
> > > > do real work with PASID.
> > >
> > > [sorry for partial reply]
> > >
> > > I am attempting to move PASID enabling/disabling out of the iommu
> > > driver and give its control to the device driver. One puzzle thing is
> > > that PCI spec requires PASID control bits not changed once the ATS is
> > > is enabled. So I also need to add iommu interfaces to enable/disable
> > > ATS on devices.
> > >
> > > By default, the ATS is enabled by the iommu subsystem to utilize the
> > > device TLB, the device driver needs to disable it before change the
> > > PASID control bits and re-enable it afterwards.
> > >
> >
> > ATS is also relied on by PRS. Not sure how that will be affected when
> > ATS is dynamically turned on/off. and PRS is not tied to PASID.
> >
> > Jason, is it still a strong requirement to have driver-opt model for
> > pasid enabling? iirc it's first raised in a discussion for an AMD GPU
> > scenario [1]
> 
> It is sounding worse and worse as we go along..
> 
> AMD and ARM both have the issue that the iommu settings and domain
> types depend on if the driver intends to use PASID or not. There is
> some small performance win if PASID is not used and the iommu driver
> knows it is not used.
> 
> We also get into some trouble with groups, I think, where it will be
> hard for the iommu driver to know which mode to use when creating a
> domain..
> 
> But, if the PASID control for a VF is on the PF then I think it is
> hopeless. The iommu or PCI layers need to make these decisions and
> drivers have to live with it. No PASID support unless the iommu turned
> it on.
> 
> This still suggests there would be some driver call to the iommu side
> to check that PASID is setup.
> 

yes, that still makes sense.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-11 15:45     ` Alex Williamson
  2023-05-12  2:52       ` Tian, Kevin
@ 2023-05-17  5:22       ` Tian, Kevin
  2023-07-27  1:17       ` Tian, Kevin
  2 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2023-05-17  5:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jason Gunthorpe, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

> From: Tian, Kevin
> Sent: Friday, May 12, 2023 10:53 AM
> 
> > From: Alex Williamson <alex.williamson@redhat.com>
> > Sent: Thursday, May 11, 2023 11:45 PM
> >
> > On Thu, 11 May 2023 07:27:27 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >
> > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > Sent: Thursday, May 11, 2023 1:25 AM
> > > >
> > > > On Tue, 9 May 2023 08:34:53 +0000
> > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > >
> > > > > According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> > > > >
> > > > >   The PASID configuration of the single non-VF Function representing
> > > > >   the device is also used by all VFs in the device. A PF is permitted
> > > > >   to implement the PASID capability, but VFs must not implement it.
> > > > >
> > > > > To enable PASID on VF then one open is where to locate the PASID
> > > > > capability in VF's vconfig space. vfio-pci doesn't know which offset
> > > > > may contain VF specific config registers. Finding such offset must
> > > > > come from a device specific knowledge.
> > > >
> > > > Backup for a moment, VFs are governed by the PASID capability on the
> > > > PF.  The PASID capability exposes control registers that imply the
> > > > ability to manage various feature enable bits.  The VF owner does not
> > > > have privileges to manipulate those bits.  For example, the PASID
> Enable
> > > > bit should restrict the endpoint from sending TLPs with a PASID prefix,
> > > > but this can only be changed at the PF level for all associated VFs.
> > > >
> > > > The protocol specified in 7.8.9.3 defines this enable bit as RW.  How do
> > > > we virtualize that?  Either it's virtualized to be read-only and we
> > > > violate the spec or we allow it to be read-write and it has no effect,
> > > > which violates the spec.
> > > >
> > >
> > > Currently the PASID cap is enabled by default when a device is probed
> > > by iommu driver. Leaving it enabled in PF while guest wants it disabled
> > > in VF is harmless. W/o proper setup in iommu side the VF cannot
> > > do real work with PASID.
> > >
> > > From this angle fully virtualizing it in software looks good to me.
> >
> > So you're suggesting that the IOMMU setup for the VF to make use of
> > PASID would not occur until or unless PASID Enable is set in the
> > virtualized VF PASID capability and that support would be torn down
> > when PASID Enable is cleared?
> 
> No that is not the case. The IOMMU setup is initiated by vIOMMU
> and orthogonal to the PASID cap virtualization.
> 
> Following the current IOMMU behavior as Baolu described the guest
> will always enable vPASID in the vIOMMU driver.
> 
> Even if the guest implements an driver-opt model for vPASID enabling,
> in typical case the guest driver will not request vIOMMU setup for VF
> PASIDs if it doesn't intend to enable vPASID cap. In this case the
> physical IOMMU is left blocking VF PASIDs hence leaving PF PASID enabled
> doesn't hurt.
> 
> If an insane guest driver does try to enable vIOMMU PASID (so VF PASIDs
> are allowed in physical IOMMU) while leaving vPASID disabled in VF,
> I'm not sure what would be the actual problem leaving PF PASID enabled.
> The guest driver kind of wants to fool itself by already setting up the
> fabric to allow VF PASID but then block PASID in VF itself?
> 
> >
> > This is still not strictly in adherence with the definition of the
> > PASID Enable bit which specifies that this bit controls whether the
> > endpoint is able to send or receive TLPs with the PASID prefix, which
> > clearly virtualization interacting with the IOMMU to block or allow
> > PASIDs from the VF RID cannot change.  Is it sufficient?
> >
> > For example we can't use the vPASID capability to make any guarantees
> > about in-flight PASID TLPs when sequencing IOMMU operations since we
> > can't actually prevent VFs using PASID so long as PASID Enable is set
> > on the PF.
> 
> IOMMU cares about in-flight PASID TLPs only when it's unblocked.
> 
> If it's already blocked then it doesn't matter whether VF is sending PASID
> TLP or not.
> 
> btw think about the current situation. Even if vfio-pci doesn't expose
> PASID cap today, it's physically enabled by iommu driver already. Then
> the guest is already able to program the device to send PASID TLP's.
> 
> fully virtualizing vPASID cap just aligns with this fact. 😊
> 

Hi, Alex,

If you are OK with above explanation we can continue discussing how
to expose the PASID cap for VF.

At the start I listed several options to quirk the offset in the kernel.

Jason suggested that the kernel should not expose the cap unconditionally.

Then I proposed it could be done via a device feature and leave the
offset to be quirked in VMM. [1] Is it a reasonable way to go?

Thanks
Kevin

[1]  https://lore.kernel.org/lkml/BN9PR11MB5276AE43183A3AA6AB806A398C749@BN9PR11MB5276.namprd11.prod.outlook.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: vPASID capability for VF
  2023-05-11 15:45     ` Alex Williamson
  2023-05-12  2:52       ` Tian, Kevin
  2023-05-17  5:22       ` Tian, Kevin
@ 2023-07-27  1:17       ` Tian, Kevin
  2 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2023-07-27  1:17 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jason Gunthorpe, kvm, linux-kernel, Liu, Yi L, Nicolin Chen, Lu, Baolu

Gentle ping in case it's buried in your mailbox. 😊

we'd like to settle it down soon as the conclusion may even affect PF (e.g. if all
agree that device feature is the right interface to go).

> From: Tian, Kevin
> Sent: Wednesday, May 17, 2023 1:22 PM
> 
> > From: Tian, Kevin
> > Sent: Friday, May 12, 2023 10:53 AM
> >
> > > From: Alex Williamson <alex.williamson@redhat.com>
> > > Sent: Thursday, May 11, 2023 11:45 PM
> > >
> > > On Thu, 11 May 2023 07:27:27 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > > From: Alex Williamson <alex.williamson@redhat.com>
> > > > > Sent: Thursday, May 11, 2023 1:25 AM
> > > > >
> > > > > On Tue, 9 May 2023 08:34:53 +0000
> > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > > > >
> > > > > > According to PCIe spec (7.8.9 PASID Extended Capability Structure):
> > > > > >
> > > > > >   The PASID configuration of the single non-VF Function representing
> > > > > >   the device is also used by all VFs in the device. A PF is permitted
> > > > > >   to implement the PASID capability, but VFs must not implement it.
> > > > > >
> > > > > > To enable PASID on VF then one open is where to locate the PASID
> > > > > > capability in VF's vconfig space. vfio-pci doesn't know which offset
> > > > > > may contain VF specific config registers. Finding such offset must
> > > > > > come from a device specific knowledge.
> > > > >
> > > > > Backup for a moment, VFs are governed by the PASID capability on
> the
> > > > > PF.  The PASID capability exposes control registers that imply the
> > > > > ability to manage various feature enable bits.  The VF owner does not
> > > > > have privileges to manipulate those bits.  For example, the PASID
> > Enable
> > > > > bit should restrict the endpoint from sending TLPs with a PASID prefix,
> > > > > but this can only be changed at the PF level for all associated VFs.
> > > > >
> > > > > The protocol specified in 7.8.9.3 defines this enable bit as RW.  How
> do
> > > > > we virtualize that?  Either it's virtualized to be read-only and we
> > > > > violate the spec or we allow it to be read-write and it has no effect,
> > > > > which violates the spec.
> > > > >
> > > >
> > > > Currently the PASID cap is enabled by default when a device is probed
> > > > by iommu driver. Leaving it enabled in PF while guest wants it disabled
> > > > in VF is harmless. W/o proper setup in iommu side the VF cannot
> > > > do real work with PASID.
> > > >
> > > > From this angle fully virtualizing it in software looks good to me.
> > >
> > > So you're suggesting that the IOMMU setup for the VF to make use of
> > > PASID would not occur until or unless PASID Enable is set in the
> > > virtualized VF PASID capability and that support would be torn down
> > > when PASID Enable is cleared?
> >
> > No that is not the case. The IOMMU setup is initiated by vIOMMU
> > and orthogonal to the PASID cap virtualization.
> >
> > Following the current IOMMU behavior as Baolu described the guest
> > will always enable vPASID in the vIOMMU driver.
> >
> > Even if the guest implements an driver-opt model for vPASID enabling,
> > in typical case the guest driver will not request vIOMMU setup for VF
> > PASIDs if it doesn't intend to enable vPASID cap. In this case the
> > physical IOMMU is left blocking VF PASIDs hence leaving PF PASID enabled
> > doesn't hurt.
> >
> > If an insane guest driver does try to enable vIOMMU PASID (so VF PASIDs
> > are allowed in physical IOMMU) while leaving vPASID disabled in VF,
> > I'm not sure what would be the actual problem leaving PF PASID enabled.
> > The guest driver kind of wants to fool itself by already setting up the
> > fabric to allow VF PASID but then block PASID in VF itself?
> >
> > >
> > > This is still not strictly in adherence with the definition of the
> > > PASID Enable bit which specifies that this bit controls whether the
> > > endpoint is able to send or receive TLPs with the PASID prefix, which
> > > clearly virtualization interacting with the IOMMU to block or allow
> > > PASIDs from the VF RID cannot change.  Is it sufficient?
> > >
> > > For example we can't use the vPASID capability to make any guarantees
> > > about in-flight PASID TLPs when sequencing IOMMU operations since we
> > > can't actually prevent VFs using PASID so long as PASID Enable is set
> > > on the PF.
> >
> > IOMMU cares about in-flight PASID TLPs only when it's unblocked.
> >
> > If it's already blocked then it doesn't matter whether VF is sending PASID
> > TLP or not.
> >
> > btw think about the current situation. Even if vfio-pci doesn't expose
> > PASID cap today, it's physically enabled by iommu driver already. Then
> > the guest is already able to program the device to send PASID TLP's.
> >
> > fully virtualizing vPASID cap just aligns with this fact. 😊
> >
> 
> Hi, Alex,
> 
> If you are OK with above explanation we can continue discussing how
> to expose the PASID cap for VF.
> 
> At the start I listed several options to quirk the offset in the kernel.
> 
> Jason suggested that the kernel should not expose the cap unconditionally.
> 
> Then I proposed it could be done via a device feature and leave the
> offset to be quirked in VMM. [1] Is it a reasonable way to go?
> 
> Thanks
> Kevin
> 
> [1]
> https://lore.kernel.org/lkml/BN9PR11MB5276AE43183A3AA6AB806A398C74
> 9@BN9PR11MB5276.namprd11.prod.outlook.com/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-27  1:17 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09  8:34 vPASID capability for VF Tian, Kevin
2023-05-09 22:43 ` Jason Gunthorpe
2023-05-09 22:57   ` Tian, Kevin
2023-05-09 23:13     ` Jason Gunthorpe
2023-05-09 23:41       ` Tian, Kevin
2023-05-10  0:31         ` Alex Williamson
2023-05-10  0:59           ` Jason Gunthorpe
2023-05-10  2:16             ` Tian, Kevin
2023-05-10 20:39               ` Jason Gunthorpe
2023-05-11  7:42                 ` Tian, Kevin
2023-05-10 17:24 ` Alex Williamson
2023-05-10 20:15   ` Jason Gunthorpe
2023-05-11  7:27   ` Tian, Kevin
2023-05-11 11:34     ` Baolu Lu
2023-05-12  2:59       ` Tian, Kevin
2023-05-12 21:01         ` Jason Gunthorpe
2023-05-17  5:09           ` Tian, Kevin
2023-05-11 15:45     ` Alex Williamson
2023-05-12  2:52       ` Tian, Kevin
2023-05-17  5:22       ` Tian, Kevin
2023-07-27  1:17       ` Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.