linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Raj, Ashok" <ashok.raj@intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	"vkoul@kernel.org" <vkoul@kernel.org>,
	"Dey, Megha" <megha.dey@intel.com>,
	"maz@kernel.org" <maz@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"parav@mellanox.com" <parav@mellanox.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"netanelg@mellanox.com" <netanelg@mellanox.com>,
	"shahafs@mellanox.com" <shahafs@mellanox.com>,
	"yan.y.zhao@linux.intel.com" <yan.y.zhao@linux.intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Ortiz, Samuel" <samuel.ortiz@intel.com>,
	"Hossain, Mona" <mona.hossain@intel.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Ashok Raj <ashok.raj@intel.com>
Subject: Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection
Date: Mon, 9 Nov 2020 21:14:12 -0800	[thread overview]
Message-ID: <20201110051412.GA20147@otc-nc-03> (raw)
In-Reply-To: <87pn4mi23u.fsf@nanos.tec.linutronix.de>

Hi Thomas,

On Mon, Nov 09, 2020 at 11:42:29PM +0100, Thomas Gleixner wrote:
> On Mon, Nov 09 2020 at 13:30, Jason Gunthorpe wrote:
> >
> > The relavance of PASID is this:
> >
> >> Again, trap emulate does not work for IMS when the IMS store is software
> >> managed guest memory and not part of the device. And that's the whole
> >> reason why we are discussing this.
> >
> > With PASID tagged interrupts and a IOMMU interrupt remapping
> > capability that can trigger on PASID, then the platform can provide
> > the same level of security as SRIOV - the above is no problem.
> >
> > The device ensures that all DMAs and all interrupts program by the
> > guest are PASID tagged and the platform provides security by checking
> > the PASID when delivering the interrupt.
> 
> Correct.
> 
> > Intel IOMMU doesn't work this way today, but it makes alot of design
> > sense.

Approach to IMS is more of a phased approach. 

#1 Allow physical device to scale beyond limits of PCIe MSIx
   Follows current methodology for guest interrupt programming and
   evolutionary changes rather than drastic.
#2 Long term we should work together on enabling IMS in guest which
   requires changes in both HW and SW eco-system.

For #1, the immediate need is to find a way to limit guest from using IMS
due to current limitations. We have couple options.

a) CPUID based method to disallow IMS when running in a guest OS. Limiting
   use to existing virtual MSIx to guest devices. (Both you/Jason alluded)
b) We can extend DMAR table to have a flag for opt-out. So in real platform
   this flag is clear and in guest VMM will ensure vDMAR will have this flag
   set. Along the lines as Jason alluded, platform level and via ACPI
   methods. We have similar use for x2apic_optout today.

Think a) is probably more generic.

For #2 Long term goal of allowing IMS in guest for devices that require
them. This requires some extensive eco-system enabling. 

- Extending HW to understand PASID-tagged interrupt messages.
- Appropriate extensions to IOMMU to enforce such PASID based isolation.

From SW improvements:

- Hypercall to retrieve addr/data from host
- Ensure SW can provide guarantee that the interrupt address range will not
  be mapped in process space when SVM is in play. Otherwise its hard to
  distinguish between DMA and Interrupt. OS needs to opt-in to this
  behavior. Today we ensure IOVA space has this 0xFEExxxxx range carve out
  of the IOVA space.


Devices such as idxd that do not have these entries on page-boundaries for
isolation to permit direct programming from GuestOS will continue to use
trap-emulate as used today.

In the end, virtualizing IMS requires eco-system collaboration, and we are
very open to change hw when all the relevant pieces are in place.

Until then, IMS will be restricted to host VMM only, and we can use the
methods above to prevent IMS in guest and continue to use the legacy
virtual MSIx.

> 
> Right.
> 
> > Otherwise the interrupt is effectively delivered to the hypervisor. A
> > secure device can *never* allow a guest to specify an addr/data pair
> > for a non-PASID tagged TLP, so the device cannot offer IMS to the
> > guest.
> 
> Ok. Let me summarize the current state of supported scenarios:
> 
>  1) SRIOV works with any form of IMS storage because it does not require
>     PASID and the VF devices have unique requester ids, which allows the
>     remap unit to sanity check the message.
> 
>  2) SIOV with IMS when the hypervisor can manage the IMS store
>     exclusively.

Today this is true for all interrupt types, MSI/MSIx/IMS.

> 
> So #2 prevents a device which handles IMS storage in queue memory to
> utilize IMS for SIOV in a guest because the hypervisor cannot manage the
> IMS message store and the guest can write arbitrary crap to it which
> violates the isolation principle.
> 
> And here is the relevant part of the SIOV spec:
> 
>  "IMS is managed by host driver software and is not accessible directly
>   from guest or user-mode drivers.
> 
>   Within the device, IMS storage is not accessible from the ADIs. ADIs
>   can request interrupt generation only through the device’s ‘Interrupt
>   Message Generation Logic’, which allows an ADI to only generate
>   interrupt messages that are associated with that specific ADI. These
>   restrictions ensure that the host driver has complete control over
>   which interrupt messages can be generated by each ADI.
> 
>   On Intel 64 architecture platforms, message signaled interrupts are
>   issued as DWORD size untranslated memory writes without a PASID TLP
>   Prefix, to address range 0xFEExxxxx. Since all memory requests
>   generated by ADIs include a PASID TLP Prefix, it is not possible for
>   an ADI to generate a DMA write that would be interpreted by the
>   platform as an interrupt message."
> 
> That's the reductio ad absurdum for this sentence in the first paragraph
> of the preceding chapter describing the concept of IMS:
> 
>   "IMS enables devices to store the interrupt messages for ADIs in a
>    device-specific optimized manner without the scalability restrictions
>    of the PCI Express defined MSI-X capability."
> 
> "Device-specific optimized manner" is either wishful thinking or
> marketing induced verbal diarrhoea.

No comment on the adjectives above :-)

> 
> The current specification puts massive restrictions on IMS storage which
> are _not_ allowing to optimize it in a device specific manner as
> demonstrated in this discussion.

IMS doesn't restrict this optimization, but to allow it requires more OS support as
you had mentioned.

> 
> It also precludes obvious use cases like passing a full device to a
> guest and let the guest manage SIOV subdevices for containers or nested
> guests.
> 
> TBH, to me this is just another hastily cobbled together half thought
> out misfeature cast in silicon. The proposed software support is
> following the exactly same principle.

Current IMS support adds incremental feature capability. Works pretty much
following everything that was created for MSIx, but just adds some device
flexibility. 

Here are some reasons why PASID isn't used today for tagging interrupts.

Interrupt messages (as specified by MSI/MSI-X in PCI specification) are 
currently defined as DWORD DMA writes to a platform/architecture specific 
address (0xFEExxxxx on Intel platforms). Existing root-complexes detect
DWORD writes to 0xFEExxxxx (without a PASID in the transaction) as interrupt 
messages and route them to interrupt-remapping logic (as opposed to other 
DMA requests that are routed to IOMMU's DMA remapping logic). 

There are multiple tools (such as logic analyzers) and OEM test validation 
harnesses that depend on such DWORD sized DMA writes with no PASID as interrupt
messages. One of the feedback we had received in the development of the
specification was to avoid impacting such tools irrespective of MSI-X or IMS 
was used for interrupt message storage (on the wire they follow the same format), 
and also to ensure interoperability of devices supporting IMS across CPU vendors 
(who may not support PASID TLP prefix).  This is one reason that led to interrupts 
from IMS to not use PASID (and match the wire format of MSI/MSI-X generated interrupts). 
The other problem was disambiguation between DMA to SVM v/s interrupts.

> 
> So before we go anywhere with this, I want to see a proper way forward
> to support _all_ sensible use cases and to fulfil the promise of
> "device-specific optimized manner" at the conceptual and specification
> and also at the code level.
> 
> I'm not at all interested to rush in support for a half baken Intel
> centric solution which other people have to clean up after the fact
> (again).

Intel had published the specification almost 2 years back and have
comprehended all the feedback received from the ecosystem 
(both open-source and others), along with offering the specification 
to be implemented by any vendors (both device and CPU vendors). 
There are few device vendors who are implementing to the spec already and 
are being explored for support by other CPU vendors

Cheers,
Ashok

  reply	other threads:[~2020-11-10  5:14 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-30 18:50 [PATCH v4 00/17] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
2020-10-30 18:50 ` [PATCH v4 01/17] irqchip: Add IMS (Interrupt Message Store) driver Dave Jiang
2020-10-30 22:01   ` Thomas Gleixner
2020-10-30 18:51 ` [PATCH v4 02/17] iommu/vt-d: Add DEV-MSI support Dave Jiang
2020-10-30 20:31   ` Thomas Gleixner
2020-10-30 20:52     ` Dave Jiang
2020-10-30 18:51 ` [PATCH v4 03/17] dmaengine: idxd: add theory of operation documentation for idxd mdev Dave Jiang
2020-10-30 18:51 ` [PATCH v4 04/17] dmaengine: idxd: add support for readonly config devices Dave Jiang
2020-10-30 18:51 ` [PATCH v4 05/17] dmaengine: idxd: add interrupt handle request support Dave Jiang
2020-10-30 18:51 ` [PATCH v4 06/17] PCI: add SIOV and IMS capability detection Dave Jiang
2020-10-30 19:51   ` Bjorn Helgaas
2020-10-30 21:20     ` Dave Jiang
2020-10-30 21:50       ` Bjorn Helgaas
2020-10-30 22:45       ` Jason Gunthorpe
2020-10-30 22:49         ` Dave Jiang
2020-11-02 13:21           ` Jason Gunthorpe
2020-11-03  2:49             ` Tian, Kevin
2020-11-03 12:43               ` Jason Gunthorpe
2020-11-04  3:41                 ` Tian, Kevin
2020-11-04 12:40                   ` Jason Gunthorpe
2020-11-04 13:34                     ` Tian, Kevin
2020-11-04 13:54                       ` Jason Gunthorpe
2020-11-06  9:48                         ` Tian, Kevin
2020-11-06 13:14                           ` Jason Gunthorpe
2020-11-06 16:48                             ` Raj, Ashok
2020-11-06 17:51                               ` Jason Gunthorpe
2020-11-06 23:47                                 ` Dan Williams
2020-11-07  0:12                                   ` Jason Gunthorpe
2020-11-07  1:42                                     ` Dan Williams
2020-11-08 18:11                                     ` Raj, Ashok
2020-11-08 18:34                                       ` David Woodhouse
2020-11-08 23:25                                         ` Raj, Ashok
2020-11-10 14:19                                           ` Raj, Ashok
2020-11-10 14:41                                             ` David Woodhouse
2020-11-08 23:41                                       ` Jason Gunthorpe
2020-11-09  0:05                                         ` Raj, Ashok
2020-11-08 18:47                                     ` Thomas Gleixner
2020-11-08 19:36                                       ` David Woodhouse
2020-11-08 22:47                                         ` Thomas Gleixner
2020-11-08 23:29                                           ` Jason Gunthorpe
2020-11-11 15:41                                         ` Christoph Hellwig
2020-11-11 16:09                                           ` Raj, Ashok
2020-11-11 22:27                                             ` Thomas Gleixner
2020-11-11 23:03                                               ` Raj, Ashok
2020-11-12  1:13                                                 ` Thomas Gleixner
2020-11-12 13:10                                                 ` Jason Gunthorpe
2020-11-08 23:23                                       ` Jason Gunthorpe
2020-11-08 23:36                                         ` Raj, Ashok
2020-11-09  7:37                                         ` Tian, Kevin
2020-11-09 16:46                                           ` Jason Gunthorpe
2020-11-08 23:58                                       ` Raj, Ashok
2020-11-09  7:59                                         ` Tian, Kevin
2020-11-09 11:21                                         ` Thomas Gleixner
2020-11-09 17:30                                           ` Jason Gunthorpe
2020-11-09 22:40                                             ` Raj, Ashok
2020-11-09 22:42                                             ` Thomas Gleixner
2020-11-10  5:14                                               ` Raj, Ashok [this message]
2020-11-10 10:27                                                 ` Thomas Gleixner
2020-11-10 14:13                                                   ` Raj, Ashok
2020-11-10 14:23                                                     ` Jason Gunthorpe
2020-11-11  2:17                                                       ` Tian, Kevin
2020-11-12 13:46                                                         ` Jason Gunthorpe
2020-11-11  7:14                                                     ` Tian, Kevin
2020-11-12 19:32                                                       ` Konrad Rzeszutek Wilk
2020-11-12 22:42                                                         ` Thomas Gleixner
2020-11-13  2:42                                                           ` Tian, Kevin
2020-11-13 12:57                                                             ` Jason Gunthorpe
2020-11-13 13:32                                                             ` Thomas Gleixner
2020-11-13 16:12                                                               ` Luck, Tony
2020-11-13 17:38                                                                 ` Raj, Ashok
2020-11-14 10:34                                                           ` Christoph Hellwig
2020-11-14 21:18                                                             ` Raj, Ashok
2020-11-15 11:26                                                               ` Thomas Gleixner
2020-11-15 19:31                                                                 ` Raj, Ashok
2020-11-15 22:11                                                                   ` Thomas Gleixner
2020-11-16  0:22                                                                     ` Raj, Ashok
2020-11-16  7:31                                                                       ` Tian, Kevin
2020-11-16 15:46                                                                         ` Jason Gunthorpe
2020-11-16 17:56                                                                           ` Thomas Gleixner
2020-11-16 18:02                                                                             ` Jason Gunthorpe
2020-11-16 20:37                                                                               ` Thomas Gleixner
2020-11-16 23:51                                                                               ` Tian, Kevin
2020-11-17  9:21                                                                                 ` Thomas Gleixner
2020-11-16  8:25                                                               ` Christoph Hellwig
2020-11-10 14:19                                                 ` Jason Gunthorpe
2020-11-11  2:35                                                   ` Tian, Kevin
2020-11-08 21:18                             ` Thomas Gleixner
2020-11-08 22:09                               ` David Woodhouse
2020-11-08 22:52                                 ` Thomas Gleixner
2020-11-07  0:32                           ` Thomas Gleixner
2020-11-09  5:25                             ` Tian, Kevin
2020-10-30 18:51 ` [PATCH v4 07/17] dmaengine: idxd: add IMS support in base driver Dave Jiang
2020-10-30 18:51 ` [PATCH v4 08/17] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2020-10-30 18:51 ` [PATCH v4 09/17] dmaengine: idxd: add basic mdev registration and helper functions Dave Jiang
2020-10-30 18:51 ` [PATCH v4 10/17] dmaengine: idxd: add emulation rw routines Dave Jiang
2020-10-30 18:52 ` [PATCH v4 11/17] dmaengine: idxd: prep for virtual device commands Dave Jiang
2020-10-30 18:52 ` [PATCH v4 12/17] dmaengine: idxd: virtual device commands emulation Dave Jiang
2020-10-30 18:52 ` [PATCH v4 13/17] dmaengine: idxd: ims setup for the vdcm Dave Jiang
2020-10-30 21:26   ` Thomas Gleixner
2020-10-30 18:52 ` [PATCH v4 14/17] dmaengine: idxd: add mdev type as a new wq type Dave Jiang
2020-10-30 18:52 ` [PATCH v4 15/17] dmaengine: idxd: add dedicated wq mdev type Dave Jiang
2020-10-30 18:52 ` [PATCH v4 16/17] dmaengine: idxd: add new wq state for mdev Dave Jiang
2020-10-30 18:52 ` [PATCH v4 17/17] dmaengine: idxd: add error notification from host driver to mediated device Dave Jiang
2020-10-30 18:58 ` [PATCH v4 00/17] Add VFIO mediated device support and DEV-MSI support for the idxd driver Jason Gunthorpe
2020-10-30 19:13   ` Dave Jiang
2020-10-30 19:17     ` Jason Gunthorpe
2020-10-30 19:23       ` Raj, Ashok
2020-10-30 19:30         ` Jason Gunthorpe
2020-10-30 20:43           ` Raj, Ashok
2020-10-30 22:54             ` Jason Gunthorpe
2020-10-31  2:50             ` Thomas Gleixner
2020-10-31 23:53               ` Raj, Ashok
2020-11-02 13:20                 ` Jason Gunthorpe
2020-11-02 16:20                   ` Raj, Ashok
2020-11-02 17:19                     ` Jason Gunthorpe
2020-11-02 18:18                       ` Dave Jiang
2020-11-02 18:26                         ` Jason Gunthorpe
2020-11-02 18:38                           ` Dan Williams
2020-11-02 18:51                             ` Jason Gunthorpe
2020-11-02 19:26                               ` Dan Williams
2020-10-30 20:48 ` Thomas Gleixner
2020-10-30 20:59   ` Dave Jiang
2020-10-30 22:10     ` Thomas Gleixner
     [not found] <draft-875z6ekcj5.fsf@nanos.tec.linutronix.de>
2020-11-09 14:08 ` [PATCH v4 06/17] PCI: add SIOV and IMS capability detection Thomas Gleixner
2020-11-09 18:10   ` Raj, Ashok

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201110051412.GA20147@otc-nc-03 \
    --to=ashok.raj@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=eric.auger@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=mona.hossain@intel.com \
    --cc=netanelg@mellanox.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=samuel.ortiz@intel.com \
    --cc=sanjay.k.kumar@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkoul@kernel.org \
    --cc=yan.y.zhao@linux.intel.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).