linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: "Raj, Ashok" <ashok.raj@intel.com>, Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Hellwig <hch@infradead.org>,
	"Wilk, Konrad" <konrad.wilk@oracle.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	"Bjorn Helgaas" <helgaas@kernel.org>,
	"vkoul@kernel.org" <vkoul@kernel.org>,
	"Dey, Megha" <megha.dey@intel.com>,
	"maz@kernel.org" <maz@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"parav@mellanox.com" <parav@mellanox.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"netanelg@mellanox.com" <netanelg@mellanox.com>,
	"shahafs@mellanox.com" <shahafs@mellanox.com>,
	"yan.y.zhao@linux.intel.com" <yan.y.zhao@linux.intel.com>,
	"pbonzini@redhat.com" <pbonzini@redhat.com>,
	"Ortiz, Samuel" <samuel.ortiz@intel.com>,
	"Hossain, Mona" <mona.hossain@intel.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Raj, Ashok" <ashok.raj@intel.com>
Subject: RE: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection
Date: Mon, 16 Nov 2020 07:31:49 +0000	[thread overview]
Message-ID: <MWHPR11MB164539B8FDE63D5CBDA300E18CE30@MWHPR11MB1645.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20201116002232.GA2440@araj-mobl1.jf.intel.com>

> From: Raj, Ashok <ashok.raj@intel.com>
> Sent: Monday, November 16, 2020 8:23 AM
> 
> On Sun, Nov 15, 2020 at 11:11:27PM +0100, Thomas Gleixner wrote:
> > On Sun, Nov 15 2020 at 11:31, Ashok Raj wrote:
> > > On Sun, Nov 15, 2020 at 12:26:22PM +0100, Thomas Gleixner wrote:
> > >> > opt-in by device or kernel? The way we are planning to support this is:
> > >> >
> > >> > Device support for IMS - Can discover in device specific means
> > >> > Kernel support for IMS. - Supported by IOMMU driver.
> > >>
> > >> And why exactly do we have to enforce IOMMU support? Please stop
> looking
> > >> at IMS purely from the IDXD perspective. We are talking about the
> > >> general concept here and not about the restricted Intel universe.
> > >
> > > I think you have mentioned it almost every reply :-)..Got that! Point taken
> > > several emails ago!! :-)
> >
> > You sure? I _try_ to not mention it again then. No promise though. :)
> 
> Hey.. anything that's entertaining go for it :-)
> 
> >
> > > I didn't mean just for idxd, I said for *ANY* device driver that wants to
> > > use IMS.
> >
> > Which is wrong. Again:
> >
> > A) For PF/VF on bare metal there is absolutely no IOMMU dependency
> >    because it does not have a PASID requirement. It's just an
> >    alternative solution to MSI[X], which allows optimizations like
> >    storing the message in driver manages queue memory or lifting the
> >    restriction of 2048 interrupts per device. Nothing else.
> 
> You are right.. my eyes were clouded by virtualization.. no dependency for
> native absolutely.
> 
> >
> > B) For PF/VF in a guest the IOMMU dependency of IMS is a red herring.
> >    There is no direct dependency on the IOMMU.
> >
> >    The problem is the inability of the VMM to trap the message write to
> >    the IMS storage if the storage is in guest driver managed memory.
> >    This can be solved with either
> >
> >    - a hypercall which translates the guest MSI message
> >    or
> >    - a vIOMMU which uses a hypercall or whatever to translate the guest
> >      MSI message
> >
> > C) Subdevices ala mdev are a different story. They require PASID which
> >    enforces IOMMU and the IMS part is not managed by the users anyway.
> 
> You are right again :)
> 
> The subdevices require PASID & IOMMU in native, but inside the guest there
> is no
> need for IOMMU unless you want to build SVM on top. subdevices work
> without
> any vIOMMU or hypercall in the guest. Only because they look like normal
> PCI devices we could map interrupts to legacy MSIx.

Guest managed subdevices on PF/VF requires vIOMMU. Anyway I think
Thomas was just pointing out that subdevices are the only category out
of above three which may have business tied to IOMMU. 😊

> 
> >
> > So we have a couple of problems to solve:
> >
> >   1) Figure out whether the OS runs on bare metal
> >
> >      There is no reliable answer to that, so we either:
> >
> >       - Use heuristics and assume that failure is unlikely and in case
> >         of failure blame the incompetence of VMM authors and/or
> >         sysadmins
> >
> >      or
> >
> >       - Default to IMS disabled and let the sysadmin enable it via
> >         command line option.
> >
> >         If the kernel detects to run in a VM it yells and disables it
> >         unless the OS and the hypervisor agree to provide support for
> >         that scenario (see #2).
> >
> >         That's fails as well if the sysadmin does so when the OS runs on
> >         a VMM which is not identifiable, but at least we can rightfully
> >         blame the sysadmin in that case.
> 
> cmdline isn't nice, best to have this functional out of box.
> 
> >
> >      or
> >
> >       - Declare that IMS always depends on IOMMU
> 
> As you had mentioned IMS has no real dependency on IOMMU in native.
> 
> we just need to make sure if running in guest we have support for it
> plumbed.
> 
> >
> >         I personaly don't care, but people working on these kind of
> >         device already said, that they want to avoid it when possible.
> >
> >         If you want to go that route, then please talk to those folks
> >         and ask them to agree in public.
> >
> >      You also need to take into account that this must work on all
> >      architectures which support virtualization because IMS is
> >      architecture independent.
> 
> What you suggest makes perfect sense. We can certainly get buy in from
> iommu list and have this co-ordinated between all existing iommu varients.

Does a hybrid scheme sound good here?

- Say a cmdline parameter: ims=[auto|on|off], with 'auto' as default;

- if ims=auto:

    * If arch doesn't implement probably_on_bare_metal, disallow ims;

    * If probably_on_bare_metal returns false, disallow ims;
	# (future) if hypercall is supported, allow ims;

    * If probably_on_bare_metal returns true, allow ims with caveat on
possible mis-interception of running on an old hypervisor. Sysadmin
may need to double-confirm in other means 
	# (future) if definitely_on_bare_metal is supported, no caveat;

- if ims=on:

    * If probably_on_bare_metal return false, yell and disable it until
hypercall is supported;

    * In all other cases allow ims. Sysadmin should be blamed if any
failure as doing so implies that extra confirmation has been done;

- if ims=off, then leave it off.

It's not necessary to claim strict dependency between ims and iommu.
Instead, we could leave iommu being an arch specific check when it
applies:

probably_on_bare_metal()
{
       if (CPUID(FEATURE_HYPERVISOR))
        	return false;
       if (dmi_match_hypervisor_vendor())
        	return false;
       if (iommu_existing() && iommu_in_guest())
        	return false;

        return PROBABLY_RUNNING_ON_BARE_METAL;
}
 
> 
> >
> >   2) Guest support for PF/VF
> >
> >      Again we have several scenarios depending on the IMS storage
> >      type.
> >
> >       - If the storage type is device memory then it's pretty much the
> >         same as MSI[X] just a different location.
> 
> True, but still need to have some special handling for trapping those mmio
> access. Unlike for MSIx VFIO already traps them and everything is
> pre-plummbed. It isn't seamless as its for MSIx.

yes. So what about tying guest IMS to hypercall even when emulation
is possible on some devices?  It's difficult for the guest to know that
its IMS is emulated by hypervisor. Adopting an unified policy for all
IMS-capable devices might be an easier path.

> 
> >
> >       - If the storage is in driver managed memory then this needs
> >         #1 plus guest OS and hypervisor support (hypercall/vIOMMU)
> 
> Violent agreement here :-)
> 
> >
> >   3) Guest support for PF/VF and guest managed subdevice (mdev)
> >
> >      Depends on #1 and #2 and is an orthogonal problem if I'm not
> >      missing something.
> >
> > To move forward we need to make a decision about #1 and #2 now.
> 
> Mostly in agreement. Except for mdev (current considered use case) have no
> need for IMS in the guest. (Don't get me wrong, I'm not saying some odd
> device managing sub-devices would need IMS in addition and that the 2048
> MSIx emulation.
> >
> > This needs to be well thought out as changing it after the fact is
> > going to be a nightmare.
> >
> > /me grudgingly refrains from mentioning the obvious once more.
> >
> 
> So this isn't an idxd and Intel only thing :-)...
> 
> Cheers,
> Ashok

Thanks
Kevin


  reply	other threads:[~2020-11-16  7:34 UTC|newest]

Thread overview: 125+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-30 18:50 [PATCH v4 00/17] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
2020-10-30 18:50 ` [PATCH v4 01/17] irqchip: Add IMS (Interrupt Message Store) driver Dave Jiang
2020-10-30 22:01   ` Thomas Gleixner
2020-10-30 18:51 ` [PATCH v4 02/17] iommu/vt-d: Add DEV-MSI support Dave Jiang
2020-10-30 20:31   ` Thomas Gleixner
2020-10-30 20:52     ` Dave Jiang
2020-10-30 18:51 ` [PATCH v4 03/17] dmaengine: idxd: add theory of operation documentation for idxd mdev Dave Jiang
2020-10-30 18:51 ` [PATCH v4 04/17] dmaengine: idxd: add support for readonly config devices Dave Jiang
2020-10-30 18:51 ` [PATCH v4 05/17] dmaengine: idxd: add interrupt handle request support Dave Jiang
2020-10-30 18:51 ` [PATCH v4 06/17] PCI: add SIOV and IMS capability detection Dave Jiang
2020-10-30 19:51   ` Bjorn Helgaas
2020-10-30 21:20     ` Dave Jiang
2020-10-30 21:50       ` Bjorn Helgaas
2020-10-30 22:45       ` Jason Gunthorpe
2020-10-30 22:49         ` Dave Jiang
2020-11-02 13:21           ` Jason Gunthorpe
2020-11-03  2:49             ` Tian, Kevin
2020-11-03 12:43               ` Jason Gunthorpe
2020-11-04  3:41                 ` Tian, Kevin
2020-11-04 12:40                   ` Jason Gunthorpe
2020-11-04 13:34                     ` Tian, Kevin
2020-11-04 13:54                       ` Jason Gunthorpe
2020-11-06  9:48                         ` Tian, Kevin
2020-11-06 13:14                           ` Jason Gunthorpe
2020-11-06 16:48                             ` Raj, Ashok
2020-11-06 17:51                               ` Jason Gunthorpe
2020-11-06 23:47                                 ` Dan Williams
2020-11-07  0:12                                   ` Jason Gunthorpe
2020-11-07  1:42                                     ` Dan Williams
2020-11-08 18:11                                     ` Raj, Ashok
2020-11-08 18:34                                       ` David Woodhouse
2020-11-08 23:25                                         ` Raj, Ashok
2020-11-10 14:19                                           ` Raj, Ashok
2020-11-10 14:41                                             ` David Woodhouse
2020-11-08 23:41                                       ` Jason Gunthorpe
2020-11-09  0:05                                         ` Raj, Ashok
2020-11-08 18:47                                     ` Thomas Gleixner
2020-11-08 19:36                                       ` David Woodhouse
2020-11-08 22:47                                         ` Thomas Gleixner
2020-11-08 23:29                                           ` Jason Gunthorpe
2020-11-11 15:41                                         ` Christoph Hellwig
2020-11-11 16:09                                           ` Raj, Ashok
2020-11-11 22:27                                             ` Thomas Gleixner
2020-11-11 23:03                                               ` Raj, Ashok
2020-11-12  1:13                                                 ` Thomas Gleixner
2020-11-12 13:10                                                 ` Jason Gunthorpe
2020-11-08 23:23                                       ` Jason Gunthorpe
2020-11-08 23:36                                         ` Raj, Ashok
2020-11-09  7:37                                         ` Tian, Kevin
2020-11-09 16:46                                           ` Jason Gunthorpe
2020-11-08 23:58                                       ` Raj, Ashok
2020-11-09  7:59                                         ` Tian, Kevin
2020-11-09 11:21                                         ` Thomas Gleixner
2020-11-09 17:30                                           ` Jason Gunthorpe
2020-11-09 22:40                                             ` Raj, Ashok
2020-11-09 22:42                                             ` Thomas Gleixner
2020-11-10  5:14                                               ` Raj, Ashok
2020-11-10 10:27                                                 ` Thomas Gleixner
2020-11-10 14:13                                                   ` Raj, Ashok
2020-11-10 14:23                                                     ` Jason Gunthorpe
2020-11-11  2:17                                                       ` Tian, Kevin
2020-11-12 13:46                                                         ` Jason Gunthorpe
2020-11-11  7:14                                                     ` Tian, Kevin
2020-11-12 19:32                                                       ` Konrad Rzeszutek Wilk
2020-11-12 22:42                                                         ` Thomas Gleixner
2020-11-13  2:42                                                           ` Tian, Kevin
2020-11-13 12:57                                                             ` Jason Gunthorpe
2020-11-13 13:32                                                             ` Thomas Gleixner
2020-11-13 16:12                                                               ` Luck, Tony
2020-11-13 17:38                                                                 ` Raj, Ashok
2020-11-14 10:34                                                           ` Christoph Hellwig
2020-11-14 21:18                                                             ` Raj, Ashok
2020-11-15 11:26                                                               ` Thomas Gleixner
2020-11-15 19:31                                                                 ` Raj, Ashok
2020-11-15 22:11                                                                   ` Thomas Gleixner
2020-11-16  0:22                                                                     ` Raj, Ashok
2020-11-16  7:31                                                                       ` Tian, Kevin [this message]
2020-11-16 15:46                                                                         ` Jason Gunthorpe
2020-11-16 17:56                                                                           ` Thomas Gleixner
2020-11-16 18:02                                                                             ` Jason Gunthorpe
2020-11-16 20:37                                                                               ` Thomas Gleixner
2020-11-16 23:51                                                                               ` Tian, Kevin
2020-11-17  9:21                                                                                 ` Thomas Gleixner
2020-11-16  8:25                                                               ` Christoph Hellwig
2020-11-10 14:19                                                 ` Jason Gunthorpe
2020-11-11  2:35                                                   ` Tian, Kevin
2020-11-08 21:18                             ` Thomas Gleixner
2020-11-08 22:09                               ` David Woodhouse
2020-11-08 22:52                                 ` Thomas Gleixner
2020-11-07  0:32                           ` Thomas Gleixner
2020-11-09  5:25                             ` Tian, Kevin
2020-10-30 18:51 ` [PATCH v4 07/17] dmaengine: idxd: add IMS support in base driver Dave Jiang
2020-10-30 18:51 ` [PATCH v4 08/17] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2020-10-30 18:51 ` [PATCH v4 09/17] dmaengine: idxd: add basic mdev registration and helper functions Dave Jiang
2020-10-30 18:51 ` [PATCH v4 10/17] dmaengine: idxd: add emulation rw routines Dave Jiang
2020-10-30 18:52 ` [PATCH v4 11/17] dmaengine: idxd: prep for virtual device commands Dave Jiang
2020-10-30 18:52 ` [PATCH v4 12/17] dmaengine: idxd: virtual device commands emulation Dave Jiang
2020-10-30 18:52 ` [PATCH v4 13/17] dmaengine: idxd: ims setup for the vdcm Dave Jiang
2020-10-30 21:26   ` Thomas Gleixner
2020-10-30 18:52 ` [PATCH v4 14/17] dmaengine: idxd: add mdev type as a new wq type Dave Jiang
2020-10-30 18:52 ` [PATCH v4 15/17] dmaengine: idxd: add dedicated wq mdev type Dave Jiang
2020-10-30 18:52 ` [PATCH v4 16/17] dmaengine: idxd: add new wq state for mdev Dave Jiang
2020-10-30 18:52 ` [PATCH v4 17/17] dmaengine: idxd: add error notification from host driver to mediated device Dave Jiang
2020-10-30 18:58 ` [PATCH v4 00/17] Add VFIO mediated device support and DEV-MSI support for the idxd driver Jason Gunthorpe
2020-10-30 19:13   ` Dave Jiang
2020-10-30 19:17     ` Jason Gunthorpe
2020-10-30 19:23       ` Raj, Ashok
2020-10-30 19:30         ` Jason Gunthorpe
2020-10-30 20:43           ` Raj, Ashok
2020-10-30 22:54             ` Jason Gunthorpe
2020-10-31  2:50             ` Thomas Gleixner
2020-10-31 23:53               ` Raj, Ashok
2020-11-02 13:20                 ` Jason Gunthorpe
2020-11-02 16:20                   ` Raj, Ashok
2020-11-02 17:19                     ` Jason Gunthorpe
2020-11-02 18:18                       ` Dave Jiang
2020-11-02 18:26                         ` Jason Gunthorpe
2020-11-02 18:38                           ` Dan Williams
2020-11-02 18:51                             ` Jason Gunthorpe
2020-11-02 19:26                               ` Dan Williams
2020-10-30 20:48 ` Thomas Gleixner
2020-10-30 20:59   ` Dave Jiang
2020-10-30 22:10     ` Thomas Gleixner
     [not found] <draft-875z6ekcj5.fsf@nanos.tec.linutronix.de>
2020-11-09 14:08 ` [PATCH v4 06/17] PCI: add SIOV and IMS capability detection Thomas Gleixner
2020-11-09 18:10   ` Raj, Ashok

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR11MB164539B8FDE63D5CBDA300E18CE30@MWHPR11MB1645.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=eric.auger@redhat.com \
    --cc=hch@infradead.org \
    --cc=helgaas@kernel.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@nvidia.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=megha.dey@intel.com \
    --cc=mona.hossain@intel.com \
    --cc=netanelg@mellanox.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=samuel.ortiz@intel.com \
    --cc=sanjay.k.kumar@intel.com \
    --cc=shahafs@mellanox.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkoul@kernel.org \
    --cc=yan.y.zhao@linux.intel.com \
    --cc=yi.l.liu@intel.com \
    --subject='RE: [PATCH v4 06/17] PCI: add SIOV and IMS capability detection' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox