From: Jason Wang <jasowang@redhat.com>
To: "Tian, Kevin" <kevin.tian@intel.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Alex Williamson <alex.williamson@redhat.com>
Cc: "Jiang, Dave" <dave.jiang@intel.com>,
"vkoul@kernel.org" <vkoul@kernel.org>,
"Dey, Megha" <megha.dey@intel.com>,
"maz@kernel.org" <maz@kernel.org>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"rafael@kernel.org" <rafael@kernel.org>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"hpa@zytor.com" <hpa@zytor.com>,
"Pan, Jacob jun" <jacob.jun.pan@intel.com>,
"Raj, Ashok" <ashok.raj@intel.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
"Lu, Baolu" <baolu.lu@intel.com>,
"Kumar, Sanjay K" <sanjay.k.kumar@intel.com>,
"Luck, Tony" <tony.luck@intel.com>,
"Lin, Jing" <jing.lin@intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
"eric.auger@redhat.com" <eric.auger@redhat.com>,
"parav@mellanox.com" <parav@mellanox.com>,
"Hansen, Dave" <dave.hansen@intel.com>,
"netanelg@mellanox.com" <netanelg@mellanox.com>,
"shahafs@mellanox.com" <shahafs@mellanox.com>,
"yan.y.zhao@linux.intel.com" <yan.y.zhao@linux.intel.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"Ortiz, Samuel" <samuel.ortiz@intel.com>,
"Hossain, Mona" <mona.hossain@intel.com>,
"dmaengine@vger.kernel.org" <dmaengine@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"x86@kernel.org" <x86@kernel.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: [PATCH RFC v2 00/18] Add VFIO mediated device support and DEV-MSI support for the idxd driver
Date: Wed, 12 Aug 2020 11:28:12 +0800 [thread overview]
Message-ID: <b59ce5b0-5530-1f30-9852-409f7c9f630a@redhat.com> (raw)
In-Reply-To: <MWHPR11MB16452EBE866E330A7E000AFC8C440@MWHPR11MB1645.namprd11.prod.outlook.com>
On 2020/8/10 下午3:32, Tian, Kevin wrote:
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Friday, August 7, 2020 8:20 PM
>>
>> On Wed, Aug 05, 2020 at 07:22:58PM -0600, Alex Williamson wrote:
>>
>>> If you see this as an abuse of the framework, then let's identify those
>>> specific issues and come up with a better approach. As we've discussed
>>> before, things like basic PCI config space emulation are acceptable
>>> overhead and low risk (imo) and some degree of register emulation is
>>> well within the territory of an mdev driver.
>> What troubles me is that idxd already has a direct userspace interface
>> to its HW, and does userspace DMA. The purpose of this mdev is to
>> provide a second direct userspace interface that is a little different
>> and trivially plugs into the virtualization stack.
> No. Userspace DMA and subdevice passthrough (what mdev provides)
> are two distinct usages IMO (at least in idxd context). and this might
> be the main divergence between us, thus let me put more words here.
> If we could reach consensus in this matter, which direction to go
> would be clearer.
>
> First, a passthrough interface requires some unique requirements
> which are not commonly observed in an userspace DMA interface, e.g.:
>
> - Tracking DMA dirty pages for live migration;
> - A set of interfaces for using SVA inside guest;
> * PASID allocation/free (on some platforms);
> * bind/unbind guest mm/page table (nested translation);
> * invalidate IOMMU cache/iotlb for guest page table changes;
> * report page request from device to guest;
> * forward page response from guest to device;
> - Configuring irqbypass for posted interrupt;
> - ...
>
> Second, a passthrough interface requires delegating raw controllability
> of subdevice to guest driver, while the same delegation might not be
> required for implementing an userspace DMA interface (especially for
> modern devices which support SVA). For example, idxd allows following
> setting per wq (guest driver may configure them in any combination):
> - put in dedicated or shared mode;
> - enable/disable SVA;
> - Associate guest-provided PASID to MSI/IMS entry;
> - set threshold;
> - allow/deny privileged access;
> - allocate/free interrupt handle (enlightened for guest);
> - collect error status;
> - ...
>
> We plan to support idxd userspace DMA with SVA. The driver just needs
> to prepare a wq with a predefined configuration (e.g. shared, SVA,
> etc.), bind the process mm to IOMMU (non-nested) and then map
> the portal to userspace. The goal that userspace can do DMA to
> associated wq doesn't change the fact that the wq is still *owned*
> and *controlled* by kernel driver. However as far as passthrough
> is concerned, the wq is considered 'owned' by the guest driver thus
> we need an interface which can support low-level *controllability*
> from guest driver. It is sort of a mess in uAPI when mixing the
> two together.
So for userspace drivers like DPDK, it can use both of the two uAPIs?
>
> Based on above two reasons, we see distinct requirements between
> userspace DMA and passthrough interfaces, at least in idxd context
> (though other devices may have less distinction in-between). Therefore,
> we didn't see the value/necessity of reinventing the wheel that mdev
> already handles well to evolve an simple application-oriented usespace
> DMA interface to a complex guest-driver-oriented passthrough interface.
> The complexity of doing so would incur far more kernel-side changes
> than the portion of emulation code that you've been concerned about...
>
>> I don't think VFIO should be the only entry point to
>> virtualization. If we say the universe of devices doing user space DMA
>> must also implement a VFIO mdev to plug into virtualization then it
>> will be alot of mdevs.
> Certainly VFIO will not be the only entry point. and This has to be a
> case-by-case decision.
The problem is that if we tie all controls via VFIO uAPI, the other
subsystem like vDPA is likely to duplicate them. I wonder if there is a
way to decouple the vSVA out of VFIO uAPI?
> If an userspace DMA interface can be easily
> adapted to be a passthrough one, it might be the choice.
It's not that easy even for VFIO which requires a lot of new uAPIs and
infrastructures(e.g mdev) to be invented.
> But for idxd,
> we see mdev a much better fit here, given the big difference between
> what userspace DMA requires and what guest driver requires in this hw.
A weak point for mdev is that it can't serve kernel subsystem other than
VFIO. In this case, you need some other infrastructures (like [1]) to do
this.
(For idxd, you probably don't need this, but it's pretty common in the
case of networking or storage device.)
Thanks
[1] https://patchwork.kernel.org/patch/11280547/
>
>> I would prefer to see that the existing userspace interface have the
>> extra needed bits for virtualization (eg by having appropriate
>> internal kernel APIs to make this easy) and all the emulation to build
>> the synthetic PCI device be done in userspace.
> In the end what decides the direction is the amount of changes that
> we have to put in kernel, not whether we call it 'emulation'. For idxd,
> adding special passthrough requirements (guest SVA, dirty tracking,
> etc.) and raw controllability to the simple userspace DMA interface
> is for sure making kernel more complex than reusing the mdev
> framework (plus some degree of emulation mockup behind). Not to
> mention the merit of uAPI compatibility with mdev...
>
> Thanks
> Kevin
>
next prev parent reply other threads:[~2020-08-12 3:28 UTC|newest]
Thread overview: 97+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-21 16:02 [PATCH RFC v2 00/18] Add VFIO mediated device support and DEV-MSI support for the idxd driver Dave Jiang
2020-07-21 16:02 ` [PATCH RFC v2 01/18] platform-msi: Introduce platform_msi_ops Dave Jiang
2020-07-21 16:02 ` [PATCH RFC v2 02/18] irq/dev-msi: Add support for a new DEV_MSI irq domain Dave Jiang
2020-07-21 16:13 ` Jason Gunthorpe
2020-07-22 16:50 ` Dey, Megha
2020-07-22 18:52 ` Marc Zyngier
2020-07-22 19:59 ` Jason Gunthorpe
2020-07-23 8:51 ` Marc Zyngier
2020-07-24 0:16 ` Jason Gunthorpe
2020-07-24 0:36 ` Thomas Gleixner
2020-08-05 19:18 ` Dey, Megha
2020-08-05 22:15 ` Jason Gunthorpe
2020-08-05 22:36 ` Dey, Megha
2020-08-05 22:53 ` Jason Gunthorpe
2020-08-06 0:13 ` Dey, Megha
2020-08-06 0:19 ` Jason Gunthorpe
2020-08-06 0:32 ` Dey, Megha
2020-08-06 0:46 ` Jason Gunthorpe
2020-08-06 17:10 ` Thomas Gleixner
2020-08-06 17:58 ` Dey, Megha
2020-08-06 20:21 ` Thomas Gleixner
2020-08-06 22:27 ` Dey, Megha
2020-08-07 8:48 ` Thomas Gleixner
2020-08-07 12:06 ` Jason Gunthorpe
2020-08-07 12:38 ` gregkh
2020-08-07 13:34 ` Jason Gunthorpe
2020-08-07 16:47 ` Thomas Gleixner
2020-08-07 17:54 ` Dey, Megha
2020-08-07 18:39 ` Jason Gunthorpe
2020-08-07 20:31 ` Dey, Megha
2020-08-08 19:47 ` Thomas Gleixner
2020-08-10 21:46 ` Thomas Gleixner
2020-08-11 9:53 ` Thomas Gleixner
2020-08-11 18:46 ` Dey, Megha
2020-08-11 21:25 ` Thomas Gleixner
2020-08-11 18:39 ` Dey, Megha
2020-08-11 22:39 ` Thomas Gleixner
2020-08-07 15:22 ` Thomas Gleixner
2020-08-05 18:55 ` Dey, Megha
2020-07-21 16:02 ` [PATCH RFC v2 03/18] irq/dev-msi: Create IR-DEV-MSI " Dave Jiang
2020-07-21 16:21 ` Jason Gunthorpe
2020-07-22 17:03 ` Dey, Megha
2020-07-22 17:33 ` Jason Gunthorpe
2020-07-22 20:44 ` Thomas Gleixner
2020-08-05 19:02 ` Dey, Megha
2020-07-21 16:02 ` [PATCH RFC v2 04/18] irq/dev-msi: Introduce APIs to allocate/free dev-msi interrupts Dave Jiang
2020-07-21 16:25 ` Jason Gunthorpe
2020-07-22 17:05 ` Dey, Megha
2020-07-22 17:35 ` Jason Gunthorpe
2020-08-05 20:19 ` Dey, Megha
2020-07-21 16:02 ` [PATCH RFC v2 05/18] dmaengine: idxd: add support for readonly config devices Dave Jiang
2020-07-21 16:02 ` [PATCH RFC v2 06/18] dmaengine: idxd: add interrupt handle request support Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 07/18] dmaengine: idxd: add DEV-MSI support in base driver Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 08/18] dmaengine: idxd: add device support functions in prep for mdev Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 09/18] dmaengine: idxd: add basic mdev registration and helper functions Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 10/18] dmaengine: idxd: add emulation rw routines Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 11/18] dmaengine: idxd: prep for virtual device commands Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 12/18] dmaengine: idxd: virtual device commands emulation Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 13/18] dmaengine: idxd: ims setup for the vdcm Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 14/18] dmaengine: idxd: add mdev type as a new wq type Dave Jiang
2020-07-21 16:03 ` [PATCH RFC v2 15/18] dmaengine: idxd: add dedicated wq mdev type Dave Jiang
2020-07-21 16:04 ` [PATCH RFC v2 16/18] dmaengine: idxd: add new wq state for mdev Dave Jiang
2020-07-21 16:04 ` [PATCH RFC v2 17/18] dmaengine: idxd: add error notification from host driver to mediated device Dave Jiang
2020-07-21 16:04 ` [PATCH RFC v2 18/18] dmaengine: idxd: add ABI documentation for mediated device support Dave Jiang
2020-07-21 16:28 ` [PATCH RFC v2 00/18] Add VFIO mediated device support and DEV-MSI support for the idxd driver Greg KH
2020-07-21 17:17 ` Dave Jiang
2020-07-21 21:35 ` Dan Williams
2020-07-21 16:45 ` Jason Gunthorpe
2020-07-21 18:00 ` Dave Jiang
2020-07-22 17:31 ` Dey, Megha
2020-07-22 18:16 ` Jason Gunthorpe
2020-07-21 23:54 ` Tian, Kevin
2020-07-24 0:19 ` Jason Gunthorpe
2020-08-06 1:22 ` Alex Williamson
2020-08-07 12:19 ` Jason Gunthorpe
2020-08-10 7:32 ` Tian, Kevin
2020-08-11 17:00 ` Alex Williamson
2020-08-12 1:58 ` Tian, Kevin
2020-08-12 2:36 ` Alex Williamson
2020-08-12 3:35 ` Tian, Kevin
2020-08-12 3:28 ` Jason Wang [this message]
2020-08-12 4:05 ` Tian, Kevin
2020-08-13 4:33 ` Jason Wang
2020-08-13 5:26 ` Tian, Kevin
2020-08-13 6:01 ` Jason Wang
2020-08-14 13:23 ` Jason Gunthorpe
2020-08-17 2:24 ` Tian, Kevin
2020-08-14 13:35 ` Jason Gunthorpe
2020-08-17 2:12 ` Tian, Kevin
2020-08-18 0:43 ` Jason Gunthorpe
2020-08-18 1:09 ` Tian, Kevin
2020-08-18 11:50 ` Jason Gunthorpe
2020-08-18 16:27 ` Paolo Bonzini
2020-08-18 16:49 ` Jason Gunthorpe
2020-08-18 17:05 ` Paolo Bonzini
2020-08-18 17:18 ` Jason Gunthorpe
2020-08-19 7:29 ` Tian, Kevin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b59ce5b0-5530-1f30-9852-409f7c9f630a@redhat.com \
--to=jasowang@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=ashok.raj@intel.com \
--cc=baolu.lu@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave.jiang@intel.com \
--cc=dmaengine@vger.kernel.org \
--cc=eric.auger@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=hpa@zytor.com \
--cc=jacob.jun.pan@intel.com \
--cc=jgg@nvidia.com \
--cc=jing.lin@intel.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=maz@kernel.org \
--cc=megha.dey@intel.com \
--cc=mona.hossain@intel.com \
--cc=netanelg@mellanox.com \
--cc=parav@mellanox.com \
--cc=pbonzini@redhat.com \
--cc=rafael@kernel.org \
--cc=samuel.ortiz@intel.com \
--cc=sanjay.k.kumar@intel.com \
--cc=shahafs@mellanox.com \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=vkoul@kernel.org \
--cc=x86@kernel.org \
--cc=yan.y.zhao@linux.intel.com \
--cc=yi.l.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).