From: "Raj, Ashok" <firstname.lastname@example.org> To: Thomas Gleixner <email@example.com> Cc: Jason Gunthorpe <firstname.lastname@example.org>, Dan Williams <email@example.com>, "Tian, Kevin" <firstname.lastname@example.org>, "Jiang, Dave" <email@example.com>, Bjorn Helgaas <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "Dey, Megha" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "firstname.lastname@example.org" <email@example.com>, "Pan, Jacob jun" <firstname.lastname@example.org>, "Liu, Yi L" <email@example.com>, "Lu, Baolu" <firstname.lastname@example.org>, "Kumar, Sanjay K" <email@example.com>, "Luck, Tony" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "Ortiz, Samuel" <email@example.com>, "Hossain, Mona" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, "email@example.com" <firstname.lastname@example.org>, Ashok Raj <email@example.com> Subject: Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection Date: Mon, 9 Nov 2020 21:14:12 -0800 [thread overview] Message-ID: <20201110051412.GA20147@otc-nc-03> (raw) In-Reply-To: <firstname.lastname@example.org> Hi Thomas, On Mon, Nov 09, 2020 at 11:42:29PM +0100, Thomas Gleixner wrote: > On Mon, Nov 09 2020 at 13:30, Jason Gunthorpe wrote: > > > > The relavance of PASID is this: > > > >> Again, trap emulate does not work for IMS when the IMS store is software > >> managed guest memory and not part of the device. And that's the whole > >> reason why we are discussing this. > > > > With PASID tagged interrupts and a IOMMU interrupt remapping > > capability that can trigger on PASID, then the platform can provide > > the same level of security as SRIOV - the above is no problem. > > > > The device ensures that all DMAs and all interrupts program by the > > guest are PASID tagged and the platform provides security by checking > > the PASID when delivering the interrupt. > > Correct. > > > Intel IOMMU doesn't work this way today, but it makes alot of design > > sense. Approach to IMS is more of a phased approach. #1 Allow physical device to scale beyond limits of PCIe MSIx Follows current methodology for guest interrupt programming and evolutionary changes rather than drastic. #2 Long term we should work together on enabling IMS in guest which requires changes in both HW and SW eco-system. For #1, the immediate need is to find a way to limit guest from using IMS due to current limitations. We have couple options. a) CPUID based method to disallow IMS when running in a guest OS. Limiting use to existing virtual MSIx to guest devices. (Both you/Jason alluded) b) We can extend DMAR table to have a flag for opt-out. So in real platform this flag is clear and in guest VMM will ensure vDMAR will have this flag set. Along the lines as Jason alluded, platform level and via ACPI methods. We have similar use for x2apic_optout today. Think a) is probably more generic. For #2 Long term goal of allowing IMS in guest for devices that require them. This requires some extensive eco-system enabling. - Extending HW to understand PASID-tagged interrupt messages. - Appropriate extensions to IOMMU to enforce such PASID based isolation. From SW improvements: - Hypercall to retrieve addr/data from host - Ensure SW can provide guarantee that the interrupt address range will not be mapped in process space when SVM is in play. Otherwise its hard to distinguish between DMA and Interrupt. OS needs to opt-in to this behavior. Today we ensure IOVA space has this 0xFEExxxxx range carve out of the IOVA space. Devices such as idxd that do not have these entries on page-boundaries for isolation to permit direct programming from GuestOS will continue to use trap-emulate as used today. In the end, virtualizing IMS requires eco-system collaboration, and we are very open to change hw when all the relevant pieces are in place. Until then, IMS will be restricted to host VMM only, and we can use the methods above to prevent IMS in guest and continue to use the legacy virtual MSIx. > > Right. > > > Otherwise the interrupt is effectively delivered to the hypervisor. A > > secure device can *never* allow a guest to specify an addr/data pair > > for a non-PASID tagged TLP, so the device cannot offer IMS to the > > guest. > > Ok. Let me summarize the current state of supported scenarios: > > 1) SRIOV works with any form of IMS storage because it does not require > PASID and the VF devices have unique requester ids, which allows the > remap unit to sanity check the message. > > 2) SIOV with IMS when the hypervisor can manage the IMS store > exclusively. Today this is true for all interrupt types, MSI/MSIx/IMS. > > So #2 prevents a device which handles IMS storage in queue memory to > utilize IMS for SIOV in a guest because the hypervisor cannot manage the > IMS message store and the guest can write arbitrary crap to it which > violates the isolation principle. > > And here is the relevant part of the SIOV spec: > > "IMS is managed by host driver software and is not accessible directly > from guest or user-mode drivers. > > Within the device, IMS storage is not accessible from the ADIs. ADIs > can request interrupt generation only through the device’s ‘Interrupt > Message Generation Logic’, which allows an ADI to only generate > interrupt messages that are associated with that specific ADI. These > restrictions ensure that the host driver has complete control over > which interrupt messages can be generated by each ADI. > > On Intel 64 architecture platforms, message signaled interrupts are > issued as DWORD size untranslated memory writes without a PASID TLP > Prefix, to address range 0xFEExxxxx. Since all memory requests > generated by ADIs include a PASID TLP Prefix, it is not possible for > an ADI to generate a DMA write that would be interpreted by the > platform as an interrupt message." > > That's the reductio ad absurdum for this sentence in the first paragraph > of the preceding chapter describing the concept of IMS: > > "IMS enables devices to store the interrupt messages for ADIs in a > device-specific optimized manner without the scalability restrictions > of the PCI Express defined MSI-X capability." > > "Device-specific optimized manner" is either wishful thinking or > marketing induced verbal diarrhoea. No comment on the adjectives above :-) > > The current specification puts massive restrictions on IMS storage which > are _not_ allowing to optimize it in a device specific manner as > demonstrated in this discussion. IMS doesn't restrict this optimization, but to allow it requires more OS support as you had mentioned. > > It also precludes obvious use cases like passing a full device to a > guest and let the guest manage SIOV subdevices for containers or nested > guests. > > TBH, to me this is just another hastily cobbled together half thought > out misfeature cast in silicon. The proposed software support is > following the exactly same principle. Current IMS support adds incremental feature capability. Works pretty much following everything that was created for MSIx, but just adds some device flexibility. Here are some reasons why PASID isn't used today for tagging interrupts. Interrupt messages (as specified by MSI/MSI-X in PCI specification) are currently defined as DWORD DMA writes to a platform/architecture specific address (0xFEExxxxx on Intel platforms). Existing root-complexes detect DWORD writes to 0xFEExxxxx (without a PASID in the transaction) as interrupt messages and route them to interrupt-remapping logic (as opposed to other DMA requests that are routed to IOMMU's DMA remapping logic). There are multiple tools (such as logic analyzers) and OEM test validation harnesses that depend on such DWORD sized DMA writes with no PASID as interrupt messages. One of the feedback we had received in the development of the specification was to avoid impacting such tools irrespective of MSI-X or IMS was used for interrupt message storage (on the wire they follow the same format), and also to ensure interoperability of devices supporting IMS across CPU vendors (who may not support PASID TLP prefix). This is one reason that led to interrupts from IMS to not use PASID (and match the wire format of MSI/MSI-X generated interrupts). The other problem was disambiguation between DMA to SVM v/s interrupts. > > So before we go anywhere with this, I want to see a proper way forward > to support _all_ sensible use cases and to fulfil the promise of > "device-specific optimized manner" at the conceptual and specification > and also at the code level. > > I'm not at all interested to rush in support for a half baken Intel > centric solution which other people have to clean up after the fact > (again). Intel had published the specification almost 2 years back and have comprehended all the feedback received from the ecosystem (both open-source and others), along with offering the specification to be implemented by any vendors (both device and CPU vendors). There are few device vendors who are implementing to the spec already and are being explored for support by other CPU vendors Cheers, Ashok
next prev parent reply other threads:[~2020-11-10 5:14 UTC|newest] Thread overview: 125+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-10-30 18:50 [PATCH v4 00/17] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang 2020-10-30 18:50 ` [PATCH v4 01/17] irqchip: Add IMS (Interrupt Message Store) driver Dave Jiang 2020-10-30 22:01 ` Thomas Gleixner 2020-10-30 18:51 ` [PATCH v4 02/17] iommu/vt-d: Add DEV-MSI support Dave Jiang 2020-10-30 20:31 ` Thomas Gleixner 2020-10-30 20:52 ` Dave Jiang 2020-10-30 18:51 ` [PATCH v4 03/17] dmaengine: idxd: add theory of operation documentation for idxd mdev Dave Jiang 2020-10-30 18:51 ` [PATCH v4 04/17] dmaengine: idxd: add support for readonly config devices Dave Jiang 2020-10-30 18:51 ` [PATCH v4 05/17] dmaengine: idxd: add interrupt handle request support Dave Jiang 2020-10-30 18:51 ` [PATCH v4 06/17] PCI: add SIOV and IMS capability detection Dave Jiang 2020-10-30 19:51 ` Bjorn Helgaas 2020-10-30 21:20 ` Dave Jiang 2020-10-30 21:50 ` Bjorn Helgaas 2020-10-30 22:45 ` Jason Gunthorpe 2020-10-30 22:49 ` Dave Jiang 2020-11-02 13:21 ` Jason Gunthorpe 2020-11-03 2:49 ` Tian, Kevin 2020-11-03 12:43 ` Jason Gunthorpe 2020-11-04 3:41 ` Tian, Kevin 2020-11-04 12:40 ` Jason Gunthorpe 2020-11-04 13:34 ` Tian, Kevin 2020-11-04 13:54 ` Jason Gunthorpe 2020-11-06 9:48 ` Tian, Kevin 2020-11-06 13:14 ` Jason Gunthorpe 2020-11-06 16:48 ` Raj, Ashok 2020-11-06 17:51 ` Jason Gunthorpe 2020-11-06 23:47 ` Dan Williams 2020-11-07 0:12 ` Jason Gunthorpe 2020-11-07 1:42 ` Dan Williams 2020-11-08 18:11 ` Raj, Ashok 2020-11-08 18:34 ` David Woodhouse 2020-11-08 23:25 ` Raj, Ashok 2020-11-10 14:19 ` Raj, Ashok 2020-11-10 14:41 ` David Woodhouse 2020-11-08 23:41 ` Jason Gunthorpe 2020-11-09 0:05 ` Raj, Ashok 2020-11-08 18:47 ` Thomas Gleixner 2020-11-08 19:36 ` David Woodhouse 2020-11-08 22:47 ` Thomas Gleixner 2020-11-08 23:29 ` Jason Gunthorpe 2020-11-11 15:41 ` Christoph Hellwig 2020-11-11 16:09 ` Raj, Ashok 2020-11-11 22:27 ` Thomas Gleixner 2020-11-11 23:03 ` Raj, Ashok 2020-11-12 1:13 ` Thomas Gleixner 2020-11-12 13:10 ` Jason Gunthorpe 2020-11-08 23:23 ` Jason Gunthorpe 2020-11-08 23:36 ` Raj, Ashok 2020-11-09 7:37 ` Tian, Kevin 2020-11-09 16:46 ` Jason Gunthorpe 2020-11-08 23:58 ` Raj, Ashok 2020-11-09 7:59 ` Tian, Kevin 2020-11-09 11:21 ` Thomas Gleixner 2020-11-09 17:30 ` Jason Gunthorpe 2020-11-09 22:40 ` Raj, Ashok 2020-11-09 22:42 ` Thomas Gleixner 2020-11-10 5:14 ` Raj, Ashok [this message] 2020-11-10 10:27 ` Thomas Gleixner 2020-11-10 14:13 ` Raj, Ashok 2020-11-10 14:23 ` Jason Gunthorpe 2020-11-11 2:17 ` Tian, Kevin 2020-11-12 13:46 ` Jason Gunthorpe 2020-11-11 7:14 ` Tian, Kevin 2020-11-12 19:32 ` Konrad Rzeszutek Wilk 2020-11-12 22:42 ` Thomas Gleixner 2020-11-13 2:42 ` Tian, Kevin 2020-11-13 12:57 ` Jason Gunthorpe 2020-11-13 13:32 ` Thomas Gleixner 2020-11-13 16:12 ` Luck, Tony 2020-11-13 17:38 ` Raj, Ashok 2020-11-14 10:34 ` Christoph Hellwig 2020-11-14 21:18 ` Raj, Ashok 2020-11-15 11:26 ` Thomas Gleixner 2020-11-15 19:31 ` Raj, Ashok 2020-11-15 22:11 ` Thomas Gleixner 2020-11-16 0:22 ` Raj, Ashok 2020-11-16 7:31 ` Tian, Kevin 2020-11-16 15:46 ` Jason Gunthorpe 2020-11-16 17:56 ` Thomas Gleixner 2020-11-16 18:02 ` Jason Gunthorpe 2020-11-16 20:37 ` Thomas Gleixner 2020-11-16 23:51 ` Tian, Kevin 2020-11-17 9:21 ` Thomas Gleixner 2020-11-16 8:25 ` Christoph Hellwig 2020-11-10 14:19 ` Jason Gunthorpe 2020-11-11 2:35 ` Tian, Kevin 2020-11-08 21:18 ` Thomas Gleixner 2020-11-08 22:09 ` David Woodhouse 2020-11-08 22:52 ` Thomas Gleixner 2020-11-07 0:32 ` Thomas Gleixner 2020-11-09 5:25 ` Tian, Kevin 2020-10-30 18:51 ` [PATCH v4 07/17] dmaengine: idxd: add IMS support in base driver Dave Jiang 2020-10-30 18:51 ` [PATCH v4 08/17] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang 2020-10-30 18:51 ` [PATCH v4 09/17] dmaengine: idxd: add basic mdev registration and helper functions Dave Jiang 2020-10-30 18:51 ` [PATCH v4 10/17] dmaengine: idxd: add emulation rw routines Dave Jiang 2020-10-30 18:52 ` [PATCH v4 11/17] dmaengine: idxd: prep for virtual device commands Dave Jiang 2020-10-30 18:52 ` [PATCH v4 12/17] dmaengine: idxd: virtual device commands emulation Dave Jiang 2020-10-30 18:52 ` [PATCH v4 13/17] dmaengine: idxd: ims setup for the vdcm Dave Jiang 2020-10-30 21:26 ` Thomas Gleixner 2020-10-30 18:52 ` [PATCH v4 14/17] dmaengine: idxd: add mdev type as a new wq type Dave Jiang 2020-10-30 18:52 ` [PATCH v4 15/17] dmaengine: idxd: add dedicated wq mdev type Dave Jiang 2020-10-30 18:52 ` [PATCH v4 16/17] dmaengine: idxd: add new wq state for mdev Dave Jiang 2020-10-30 18:52 ` [PATCH v4 17/17] dmaengine: idxd: add error notification from host driver to mediated device Dave Jiang 2020-10-30 18:58 ` [PATCH v4 00/17] Add VFIO mediated device support and DEV-MSI support for the idxd driver Jason Gunthorpe 2020-10-30 19:13 ` Dave Jiang 2020-10-30 19:17 ` Jason Gunthorpe 2020-10-30 19:23 ` Raj, Ashok 2020-10-30 19:30 ` Jason Gunthorpe 2020-10-30 20:43 ` Raj, Ashok 2020-10-30 22:54 ` Jason Gunthorpe 2020-10-31 2:50 ` Thomas Gleixner 2020-10-31 23:53 ` Raj, Ashok 2020-11-02 13:20 ` Jason Gunthorpe 2020-11-02 16:20 ` Raj, Ashok 2020-11-02 17:19 ` Jason Gunthorpe 2020-11-02 18:18 ` Dave Jiang 2020-11-02 18:26 ` Jason Gunthorpe 2020-11-02 18:38 ` Dan Williams 2020-11-02 18:51 ` Jason Gunthorpe 2020-11-02 19:26 ` Dan Williams 2020-10-30 20:48 ` Thomas Gleixner 2020-10-30 20:59 ` Dave Jiang 2020-10-30 22:10 ` Thomas Gleixner [not found] <email@example.com> 2020-11-09 14:08 ` [PATCH v4 06/17] PCI: add SIOV and IMS capability detection Thomas Gleixner 2020-11-09 18:10 ` Raj, Ashok
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20201110051412.GA20147@otc-nc-03 \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).