linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Dan Williams <dan.j.williams@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	Bjorn Helgaas <helgaas@kernel.org>,
	"vkoul@kernel.org" <vkoul@kernel.org>,
	"Dey, Megha" <megha.dey@intel.com>,
	"maz@kernel.org" <maz@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"jing.lin@intel.com" <jing.lin@intel.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"parav@mellanox.com" <parav@mellanox.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"netanelg@mellanox.com" <netanelg@mellanox.com>,
	"shahafs@mellanox.com" <shahafs@mellanox.com>,
	"yan.y.zhao@linux.intel.com" <yan.y.zhao@linux.intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Ortiz, Samuel" <samuel.ortiz@intel.com>,
	"Hossain, Mona" <mona.hossain@intel.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection
Date: Sun, 8 Nov 2020 19:23:41 -0400	[thread overview]
Message-ID: <20201108232341.GB2620339@nvidia.com> (raw)
In-Reply-To: <87pn4nk7nn.fsf@nanos.tec.linutronix.de>

On Sun, Nov 08, 2020 at 07:47:24PM +0100, Thomas Gleixner wrote:
> 
> That means the guest needs a way to ask the hypervisor for a proper
> translation, i.e. a hypercall. Now where to do that? Looking at the
> above remapping case it's pretty obvious:
> 
> 
>                      |
>                      |
>   [CPU]       -- [VI | RT]  -- [Bridge] --    Bus    -- [Device]
>                      |
>   Alloc          "Compose"                   Store         Use
> 
>   Vectordomain   HCALLdomain                Busdomain
>                  |        ^
>                  |        |
>                  v        | 
>             Hypervisor    
>                Alloc + Compose

Yes, this will describes what I have been thinking

> Now the question which I can't answer is whether this can work correctly
> in terms of isolation. If the IMS storage is in guest memory (queue
> storage) then the guest driver can obviously write random crap into it
> which the device will happily send. (For MSI and IDXD style IMS it
> still can trap the store).

There are four cases of interest here:

 1) Bare metal, PF and VF devices just deliver whatever addr/data pairs
    to the APIC. IMS works perfectly with pci_subdevice_msi_create_irq_domain()

 2) SRIOV VF assigned to the guest.

    The guest can cause any MemWr TLP to any addr/data pair
    and the iommu/platform/vmm is supposed to use the
    Bus/device/function to isolate & secure the interrupt address
    range.

    IMS can work in the guest if the guest knows the details of the
    address range and can make hypercalls to setup routing. So
    pci_subdevice_msi_create_irq_domain() works if the hypercalls
    exist and fails if they don't.

 3) SIOV sub device assigned to the guest.

    The difference between SIOV and SRIOV is the device must attach a
    PASID to every TLP triggered by the guest. Logically we'd expect
    when IMS is used in this situation the interrupt MemWr is tagged
    with bus/device/function/PASID to uniquly ID the guest and the same
    security protection scheme from #2 applies.

 4) SIOV sub device assigned to the guest, but with emulation.

    This SIOV device cannot tag interrupts with PASID so cannot do #2
    (or the platform cannot recieve a PASID tagged interrupt message).

    Since the interrupts are being delivered with TLPs pointing at the
    hypervisor the only solution is for the hypervisor to exclusively
    control the interrupt table. MSI table like emulation for IMS is
    needed and the hypervisor will use pci_subdevice_msi_create_irq_domain()
    to get the real interrupts.

    pci_subdevice_msi_create_irq_domain() needs to return the 'fake'
    addr/data pairs which are actually an ABI between the guest and
    hypervisor carried in the hidden hypercall of the emulation.
    (ie it works like MSI works today)

IDXD is worring about case #4, I think, but I didn't follow in that
whole discussion about the IMS table layout if they PASID tag the IMS
MemWr or not?? Ashok can you clarify?

> Is the IOMMU/Interrupt remapping unit able to catch such messages which
> go outside the space to which the guest is allowed to signal to? If yes,
> problem solved. If no, then IMS storage in guest memory can't ever work.

Right. Only PASID on the interrupt messages can resolve this securely.

> So in case that the HCALL domain is missing, the Vector domain needs
> return an error code on domain creation. If the HCALL domain is there
> then the domain creation works and in case of actual interrupt
> allocation the hypercall either returns a valid composed message or an
> appropriate error code.

Yes
 
> But there's a catch:
> 
> This only works when the guest OS actually knows that it runs in a
> VM. If the guest can't figure that out, i.e. via CPUID, this cannot be
> solved because from the guest OS view that's the same as running on bare
> metal. Obviously on bare metal the Vector domain can and must handle
> this.

Yes

The flip side is today, the way pci_subdevice_msi_create_irq_domain()
works a VF using it on baremetal will succeed and if that same VF is
assigned to a guest then pci_subdevice_msi_create_irq_domain()
succeeds but the interrupt never comes - so the driver is broken.

Yes, if we add some ACPI/etc flag that is not going to magically fix
old kvm's, but at least *something* exists that works right and
generically.

If we follow Intel's path then we need special KVM level support for
*every* device, PCI cap mangling and so on. Forever. Sounds horrible
to me..

This feels like one of these things where no matter what we do
something is broken. Picking the least breakage is the challenge here.

> So this needs some thought.

I think your HAOLL diagram is the only sane architecture.

If go that way then case #4 will still work, in this case the HCALL
will return addr/data pairs that conform to what the emulation
expects. Or fail if the VMM can't do emulation for the device.

Jason

  parent reply	other threads:[~2020-11-08 23:23 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-30 18:50 [PATCH v4 00/17] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
2020-10-30 18:50 ` [PATCH v4 01/17] irqchip: Add IMS (Interrupt Message Store) driver Dave Jiang
2020-10-30 22:01   ` Thomas Gleixner
2020-10-30 18:51 ` [PATCH v4 02/17] iommu/vt-d: Add DEV-MSI support Dave Jiang
2020-10-30 20:31   ` Thomas Gleixner
2020-10-30 20:52     ` Dave Jiang
2020-10-30 18:51 ` [PATCH v4 03/17] dmaengine: idxd: add theory of operation documentation for idxd mdev Dave Jiang
2020-10-30 18:51 ` [PATCH v4 04/17] dmaengine: idxd: add support for readonly config devices Dave Jiang
2020-10-30 18:51 ` [PATCH v4 05/17] dmaengine: idxd: add interrupt handle request support Dave Jiang
2020-10-30 18:51 ` [PATCH v4 06/17] PCI: add SIOV and IMS capability detection Dave Jiang
2020-10-30 19:51   ` Bjorn Helgaas
2020-10-30 21:20     ` Dave Jiang
2020-10-30 21:50       ` Bjorn Helgaas
2020-10-30 22:45       ` Jason Gunthorpe
2020-10-30 22:49         ` Dave Jiang
2020-11-02 13:21           ` Jason Gunthorpe
2020-11-03  2:49             ` Tian, Kevin
2020-11-03 12:43               ` Jason Gunthorpe
2020-11-04  3:41                 ` Tian, Kevin
2020-11-04 12:40                   ` Jason Gunthorpe
2020-11-04 13:34                     ` Tian, Kevin
2020-11-04 13:54                       ` Jason Gunthorpe
2020-11-06  9:48                         ` Tian, Kevin
2020-11-06 13:14                           ` Jason Gunthorpe
2020-11-06 16:48                             ` Raj, Ashok
2020-11-06 17:51                               ` Jason Gunthorpe
2020-11-06 23:47                                 ` Dan Williams
2020-11-07  0:12                                   ` Jason Gunthorpe
2020-11-07  1:42                                     ` Dan Williams
2020-11-08 18:11                                     ` Raj, Ashok
2020-11-08 18:34                                       ` David Woodhouse
2020-11-08 23:25                                         ` Raj, Ashok
2020-11-10 14:19                                           ` Raj, Ashok
2020-11-10 14:41                                             ` David Woodhouse
2020-11-08 23:41                                       ` Jason Gunthorpe
2020-11-09  0:05                                         ` Raj, Ashok
2020-11-08 18:47                                     ` Thomas Gleixner
2020-11-08 19:36                                       ` David Woodhouse
2020-11-08 22:47                                         ` Thomas Gleixner
2020-11-08 23:29                                           ` Jason Gunthorpe
2020-11-11 15:41                                         ` Christoph Hellwig
2020-11-11 16:09                                           ` Raj, Ashok
2020-11-11 22:27                                             ` Thomas Gleixner
2020-11-11 23:03                                               ` Raj, Ashok
2020-11-12  1:13                                                 ` Thomas Gleixner
2020-11-12 13:10                                                 ` Jason Gunthorpe
2020-11-08 23:23                                       ` Jason Gunthorpe [this message]
2020-11-08 23:36                                         ` Raj, Ashok
2020-11-09  7:37                                         ` Tian, Kevin
2020-11-09 16:46                                           ` Jason Gunthorpe
2020-11-08 23:58                                       ` Raj, Ashok
2020-11-09  7:59                                         ` Tian, Kevin
2020-11-09 11:21                                         ` Thomas Gleixner
2020-11-09 17:30                                           ` Jason Gunthorpe
2020-11-09 22:40                                             ` Raj, Ashok
2020-11-09 22:42                                             ` Thomas Gleixner
2020-11-10  5:14                                               ` Raj, Ashok
2020-11-10 10:27                                                 ` Thomas Gleixner
2020-11-10 14:13                                                   ` Raj, Ashok
2020-11-10 14:23                                                     ` Jason Gunthorpe
2020-11-11  2:17                                                       ` Tian, Kevin
2020-11-12 13:46                                                         ` Jason Gunthorpe
2020-11-11  7:14                                                     ` Tian, Kevin
2020-11-12 19:32                                                       ` Konrad Rzeszutek Wilk
2020-11-12 22:42                                                         ` Thomas Gleixner
2020-11-13  2:42                                                           ` Tian, Kevin
2020-11-13 12:57                                                             ` Jason Gunthorpe
2020-11-13 13:32                                                             ` Thomas Gleixner
2020-11-13 16:12                                                               ` Luck, Tony
2020-11-13 17:38                                                                 ` Raj, Ashok
2020-11-14 10:34                                                           ` Christoph Hellwig
2020-11-14 21:18                                                             ` Raj, Ashok
2020-11-15 11:26                                                               ` Thomas Gleixner
2020-11-15 19:31                                                                 ` Raj, Ashok
2020-11-15 22:11                                                                   ` Thomas Gleixner
2020-11-16  0:22                                                                     ` Raj, Ashok
2020-11-16  7:31                                                                       ` Tian, Kevin
2020-11-16 15:46                                                                         ` Jason Gunthorpe
2020-11-16 17:56                                                                           ` Thomas Gleixner
2020-11-16 18:02                                                                             ` Jason Gunthorpe
2020-11-16 20:37                                                                               ` Thomas Gleixner
2020-11-16 23:51                                                                               ` Tian, Kevin
2020-11-17  9:21                                                                                 ` Thomas Gleixner
2020-11-16  8:25                                                               ` Christoph Hellwig
2020-11-10 14:19                                                 ` Jason Gunthorpe
2020-11-11  2:35                                                   ` Tian, Kevin
2020-11-08 21:18                             ` Thomas Gleixner
2020-11-08 22:09                               ` David Woodhouse
2020-11-08 22:52                                 ` Thomas Gleixner
2020-11-07  0:32                           ` Thomas Gleixner
2020-11-09  5:25                             ` Tian, Kevin
2020-10-30 18:51 ` [PATCH v4 07/17] dmaengine: idxd: add IMS support in base driver Dave Jiang
2020-10-30 18:51 ` [PATCH v4 08/17] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2020-10-30 18:51 ` [PATCH v4 09/17] dmaengine: idxd: add basic mdev registration and helper functions Dave Jiang
2020-10-30 18:51 ` [PATCH v4 10/17] dmaengine: idxd: add emulation rw routines Dave Jiang
2020-10-30 18:52 ` [PATCH v4 11/17] dmaengine: idxd: prep for virtual device commands Dave Jiang
2020-10-30 18:52 ` [PATCH v4 12/17] dmaengine: idxd: virtual device commands emulation Dave Jiang
2020-10-30 18:52 ` [PATCH v4 13/17] dmaengine: idxd: ims setup for the vdcm Dave Jiang
2020-10-30 21:26   ` Thomas Gleixner
2020-10-30 18:52 ` [PATCH v4 14/17] dmaengine: idxd: add mdev type as a new wq type Dave Jiang
2020-10-30 18:52 ` [PATCH v4 15/17] dmaengine: idxd: add dedicated wq mdev type Dave Jiang
2020-10-30 18:52 ` [PATCH v4 16/17] dmaengine: idxd: add new wq state for mdev Dave Jiang
2020-10-30 18:52 ` [PATCH v4 17/17] dmaengine: idxd: add error notification from host driver to mediated device Dave Jiang
2020-10-30 18:58 ` [PATCH v4 00/17] Add VFIO mediated device support and DEV-MSI support for the idxd driver Jason Gunthorpe
2020-10-30 19:13   ` Dave Jiang
2020-10-30 19:17     ` Jason Gunthorpe
2020-10-30 19:23       ` Raj, Ashok
2020-10-30 19:30         ` Jason Gunthorpe
2020-10-30 20:43           ` Raj, Ashok
2020-10-30 22:54             ` Jason Gunthorpe
2020-10-31  2:50             ` Thomas Gleixner
2020-10-31 23:53               ` Raj, Ashok
2020-11-02 13:20                 ` Jason Gunthorpe
2020-11-02 16:20                   ` Raj, Ashok
2020-11-02 17:19                     ` Jason Gunthorpe
2020-11-02 18:18                       ` Dave Jiang
2020-11-02 18:26                         ` Jason Gunthorpe
2020-11-02 18:38                           ` Dan Williams
2020-11-02 18:51                             ` Jason Gunthorpe
2020-11-02 19:26                               ` Dan Williams
2020-10-30 20:48 ` Thomas Gleixner
2020-10-30 20:59   ` Dave Jiang
2020-10-30 22:10     ` Thomas Gleixner
     [not found] <draft-875z6ekcj5.fsf@nanos.tec.linutronix.de>
2020-11-09 14:08 ` [PATCH v4 06/17] PCI: add SIOV and IMS capability detection Thomas Gleixner
2020-11-09 18:10   ` Raj, Ashok

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201108232341.GB2620339@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=eric.auger@redhat.com \
    --cc=helgaas@kernel.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jing.lin@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=mona.hossain@intel.com \
    --cc=netanelg@mellanox.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=samuel.ortiz@intel.com \
    --cc=sanjay.k.kumar@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkoul@kernel.org \
    --cc=yan.y.zhao@linux.intel.com \
    --cc=yi.l.liu@intel.com \
    --subject='Re: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox