linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Tian, Kevin" <kevin.tian@intel.com>
To: Jason Gunthorpe <jgg@mellanox.com>, "Raj, Ashok" <ashok.raj@intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	"vkoul@kernel.org" <vkoul@kernel.org>,
	"megha.dey@linux.intel.com" <megha.dey@linux.intel.com>,
	"maz@kernel.org" <maz@kernel.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Lu, Baolu" <baolu.lu@intel.com>,
	"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"Lin, Jing" <jing.lin@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"parav@mellanox.com" <parav@mellanox.com>,
	"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"x86@kernel.org" <x86@kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: RE: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.
Date: Wed, 13 May 2020 08:30:15 +0000	[thread overview]
Message-ID: <MWHPR11MB1645C60468BC6C6009C3DDE28CBF0@MWHPR11MB1645.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20200509122113.GP19158@mellanox.com>

> From: Jason Gunthorpe
> Sent: Saturday, May 9, 2020 8:21 PM
> > > putting emulation code back into them, except in a more dangerous
> > > kernel location. This does not seem like a net win to me.
> >
> > Its not a whole lot of emulation right? mdev are soft partitioned. There is
> > just a single PF, but we can create a separate partition for the guest using
> > PASID along with the normal BDF (RID). And exposing a consistent PCI like
> > interface to user space you get everything else for free.
> >
> > Yes, its not SRIOV, but giving that interface to user space via VFIO, we get
> > all of that functionality without having to reinvent a different way to do it.
> >
> > vDPA went the other way, IRC, they went and put a HW implementation of
> what
> > virtio is in hardware. So they sort of fit the model. Here the instance
> > looks and feels like real hardware for the setup and control aspect.
> 
> VDPA and this are very similar, of course it depends on the exact HW
> implementation.
> 

Hi, Jason,

I have more thoughts below. let's see whether making sense to you.

When talking about virtualization, here the target is unmodified guest 
kernel driver which expects seeing the raw controllability of queues 
as defined by device spec. In idxd, such controllability includes enable/
disable SVA, dedicated or shared WQ, size, threshold, privilege, fault 
mode, max batch size, and many other attributes. Different guest OS 
has its own policy of using all or partial available controllability. 

When talking about application, we care about providing an efficient
programming interface to userspace. For example with uacce, we
allow an application to submit vaddr-based workloads to a reserved
WQ with kernel bypassed. But it's not necessary to export the raw
controllability of the reserved WQ to userspace, and we still rely on
kernel driver to configure it including bind_mm. I'm not sure whether 
uacce would like to evolve as a generic queue management system
including non-SVA and all vendor specific raw capabilities as 
expected by all kinds of guest kernel drivers. It sounds like not 
worthwhile at this point, given that we already have an highly efficient 
SVA interface for user applications.

That is why we start with mdev as an evolutionary approach. Mdev is 
introduced to expose raw controllability of a subdevice (WQ or ADI) to 
guest. It build a channel between guest kernel driver and host kernel 
driver and uses device spec as the uAPI by sticking to the mmio interface.
and all virtualization related setups are just consolidated together in vfio. 
the drawback, as you pointed out, is putting some degree of emulation
code in the kernel. But as explained earlier, they are only small portion of
code. Moreover, most registers are emulated as simple memory read/
write, while the remaining logic mostly belongs to raw controllability 
(e.g. cmd register) that host driver grants to the guest thus must 
propagate to the device. For the latter part, I would call it more as 
'mediation' instead of 'emulation', as required in whatever uapi would 
be used.

If in the future, there do have such requirement of delegating raw
WQ controllability to pure userspace applications for DMA engines, 
and there is be a well-defined uAPI to cover a large common set of 
controllability across multiple vendors, we will look at that option for
sure.

From above p.o.v, I feel vdpa is a different story. virtio/vhost has a 
well established eco-system between guest and host. The user
space VMM already emulates all available controllability as defined 
in virtio spec. Host kernel already supports vhost uAPI for vring
setup, iotlb management, etc. Extending that path for data path
offloading sounds a reasonable choice for vdpa...

Thanks
Kevin

  parent reply	other threads:[~2020-05-13  8:30 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-21 23:33 Dave Jiang
2020-04-21 23:33 ` [PATCH RFC 01/15] drivers/base: Introduce platform_msi_ops Dave Jiang
2020-04-26  7:01   ` Greg KH
2020-04-27 21:38     ` Dave Jiang
2020-04-28  7:34       ` Greg KH
2020-04-21 23:33 ` [PATCH RFC 02/15] drivers/base: Introduce a new platform-msi list Dave Jiang
2020-04-25 21:13   ` Thomas Gleixner
2020-05-04  0:08     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 03/15] drivers/base: Allocate/free platform-msi interrupts by group Dave Jiang
2020-04-25 21:23   ` Thomas Gleixner
2020-05-04  0:08     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 04/15] drivers/base: Add support for a new IMS irq domain Dave Jiang
2020-04-23 20:11   ` Jason Gunthorpe
2020-05-01 22:30     ` Dey, Megha
2020-05-03 22:25       ` Jason Gunthorpe
2020-05-03 22:40         ` Dey, Megha
2020-05-03 22:46           ` Jason Gunthorpe
2020-05-04  0:25             ` Dey, Megha
2020-05-04 12:14               ` Jason Gunthorpe
2020-05-06 10:27                 ` Tian, Kevin
2020-04-25 21:38   ` Thomas Gleixner
2020-05-04  0:11     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 05/15] ims-msi: Add mask/unmask routines Dave Jiang
2020-04-25 21:49   ` Thomas Gleixner
2020-05-04  0:16     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 06/15] ims-msi: Enable IMS interrupts Dave Jiang
2020-04-25 22:13   ` Thomas Gleixner
2020-05-04  0:17     ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 07/15] Documentation: Interrupt Message store Dave Jiang
2020-04-23 20:04   ` Jason Gunthorpe
2020-05-01 22:32     ` Dey, Megha
2020-05-03 22:28       ` Jason Gunthorpe
2020-05-03 22:41         ` Dey, Megha
2020-04-21 23:34 ` [PATCH RFC 08/15] vfio/mdev: Add a member for iommu domain in mdev_device Dave Jiang
2020-04-21 23:34 ` [PATCH RFC 09/15] vfio/type1: Save domain when attach domain to mdev Dave Jiang
2020-04-21 23:34 ` [PATCH RFC 10/15] dmaengine: idxd: add config support for readonly devices Dave Jiang
2020-04-21 23:34 ` [PATCH RFC 11/15] dmaengine: idxd: add IMS support in base driver Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 12/15] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 13/15] dmaengine: idxd: add support for VFIO mediated device Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 14/15] dmaengine: idxd: add error notification from host driver to " Dave Jiang
2020-04-21 23:35 ` [PATCH RFC 15/15] dmaengine: idxd: add ABI documentation for mediated device support Dave Jiang
2020-04-21 23:54 ` [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver Jason Gunthorpe
2020-04-22  0:53   ` Tian, Kevin
2020-04-22 11:50     ` Jason Gunthorpe
2020-04-22 21:14       ` Raj, Ashok
2020-04-23 19:12         ` Jason Gunthorpe
2020-04-24  3:27           ` Tian, Kevin
2020-04-24 12:44             ` Jason Gunthorpe
2020-04-24 16:25               ` Tian, Kevin
2020-04-24 18:12                 ` Jason Gunthorpe
2020-04-26  5:18                   ` Tian, Kevin
2020-04-26 19:13                     ` Jason Gunthorpe
2020-04-27  3:43                       ` Alex Williamson
2020-04-27 11:58                         ` Jason Gunthorpe
2020-04-27 13:19                           ` Alex Williamson
2020-04-27 13:22                             ` Jason Gunthorpe
2020-04-27 14:18                               ` Alex Williamson
2020-04-27 14:25                                 ` Jason Gunthorpe
2020-04-27 15:41                                   ` Alex Williamson
2020-04-27 16:16                                     ` Jason Gunthorpe
2020-04-27 16:25                                       ` Dave Jiang
2020-04-27 21:56                                         ` Jason Gunthorpe
2020-04-29  9:42                               ` Tian, Kevin
2020-05-08 20:47                                 ` Raj, Ashok
2020-05-08 23:16                                   ` Jason Gunthorpe
2020-05-08 23:52                                     ` Dave Jiang
2020-05-09  0:09                                     ` Raj, Ashok
2020-05-09 12:21                                       ` Jason Gunthorpe
2020-05-13  2:29                                         ` Jason Wang
2020-05-13  8:30                                         ` Tian, Kevin [this message]
2020-05-13 12:40                                           ` Jason Gunthorpe
2020-04-27 12:13                       ` Tian, Kevin
2020-04-27 12:55                         ` Jason Gunthorpe
2020-04-22 21:24   ` Dan Williams
2020-04-23 19:17     ` Dan Williams
2020-04-23 19:49       ` Jason Gunthorpe
2020-05-01 22:31         ` Dey, Megha
2020-05-03 22:21           ` Jason Gunthorpe
2020-05-03 22:32             ` Dey, Megha
2020-04-23 19:18     ` Jason Gunthorpe
2020-05-01 22:31       ` Dey, Megha
2020-05-03 22:22         ` Jason Gunthorpe
2020-05-03 22:31           ` Dey, Megha
2020-05-03 22:36             ` Jason Gunthorpe
2020-05-04  0:20               ` Dey, Megha
2020-04-22 23:04   ` Dey, Megha
2020-04-23 19:44     ` Jason Gunthorpe
2020-05-01 22:32       ` Dey, Megha
2020-04-24  6:31   ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR11MB1645C60468BC6C6009C3DDE28CBF0@MWHPR11MB1645.namprd11.prod.outlook.com \
    --to=kevin.tian@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dmaengine@vger.kernel.org \
    --cc=eric.auger@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@mellanox.com \
    --cc=jing.lin@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=megha.dey@linux.intel.com \
    --cc=parav@mellanox.com \
    --cc=pbonzini@redhat.com \
    --cc=rafael@kernel.org \
    --cc=sanjay.k.kumar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=vkoul@kernel.org \
    --cc=x86@kernel.org \
    --cc=yi.l.liu@intel.com \
    --subject='RE: [PATCH RFC 00/15] Add VFIO mediated device support and IMS support for the idxd driver.' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).