All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Liu, Yi L" <yi.l.liu@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>, "Tian, Kevin" <kevin.tian@intel.com>
Cc: Jason Wang <jasowang@redhat.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"jacob.jun.pan@linux.intel.com" <jacob.jun.pan@linux.intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"Tian, Jun J" <jun.j.tian@intel.com>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
	"peterx@redhat.com" <peterx@redhat.com>,
	"Wu, Hao" <hao.wu@intel.com>,
	"stefanha@gmail.com" <stefanha@gmail.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Date: Mon, 19 Oct 2020 08:39:03 +0000	[thread overview]
Message-ID: <DM5PR11MB1435A3AEC0637C4531F2FE92C31E0@DM5PR11MB1435.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20201016153632.GM6219@nvidia.com>

Hi Jason,

Good to see your response.

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, October 16, 2020 11:37 PM
> 
> On Wed, Oct 14, 2020 at 03:16:22AM +0000, Tian, Kevin wrote:
> > Hi, Alex and Jason (G),
> >
> > How about your opinion for this new proposal? For now looks both
> > Jason (W) and Jean are OK with this direction and more discussions
> > are possibly required for the new /dev/ioasid interface. Internally
> > we're doing a quick prototype to see any unforeseen issue with this
> > separation.
> 
> Assuming VDPA and VFIO will be the only two users so duplicating
> everything only twice sounds pretty restricting to me.
> 
> > > Second, IOMMU nested translation is a per IOMMU domain
> > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > > etc.), reporting/enabling the nesting capability is an natural
> > > extension to the domain uAPI of existing passthrough frameworks.
> > > Actually, VFIO already includes a nesting enable interface even
> > > before this series. So it doesn't make sense to generalize this uAPI
> > > out.
> 
> The subsystem that obtains an IOMMU domain for a device would have to
> register it with an open FD of the '/dev/sva'. That is the connection
> between the two subsystems. It would be some simple kernel internal
> stuff:
> 
>   sva = get_sva_from_file(fd);

Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs
which will finally program page table to host iommu driver. As far as I know,
it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after
it sets a proper iommu type to the opened container. VFIO container already
stands for an iommu context with which userspace could program page mapping
to host iommu.

>   sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);

So this is supposed to be called by VFIO/VDPA to register the info to /dev/sva.
right? And in dev/sva, it will also maintain the device/iommu_domain and pasid
info? will it be duplicated with VFIO/VDPA?

> Not sure why this is a roadblock?
> 
> How would this be any different from having some kernel libsva that
> VDPA and VFIO would both rely on?
> 
> You don't plan to just open code all this stuff in VFIO, do you?
> 
> > > Then the tricky part comes with the remaining operations (3/4/5),
> > > which are all backed by iommu_ops thus effective only within an
> > > IOMMU domain. To generalize them, the first thing is to find a way
> > > to associate the sva_FD (opened through generic /dev/sva) with an
> > > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > > to replicate {domain<->device/subdevice} association in /dev/sva
> > > path because some operations (e.g. page fault) is triggered/handled
> > > per device/subdevice. Therefore, /dev/sva must provide both per-
> > > domain and per-device uAPIs similar to what VFIO/VDPA already
> > > does.
> 
> Yes, the point here was to move the general APIs out of VFIO and into
> a sharable location. So, of course one would expect some duplication
> during the transition period.
> 
> > > Moreover, mapping page fault to subdevice requires pre-
> > > registering subdevice fault data to IOMMU layer when binding
> > > guest page table, while such fault data can be only retrieved from
> > > parent driver through VFIO/VDPA.
> 
> Not sure what this means, page fault should be tied to the PASID, any
> hookup needed for that should be done in-kernel when the device is
> connected to the PASID.

you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to
software together with the requestor id of the device. For the page request
injects to guest, it should have the device info.

Regards,
Yi Liu

> 
> > > space but they may be organized in multiple IOMMU domains based
> > > on their bus type. How (should we let) the userspace know the
> > > domain information and open an sva_FD for each domain is the main
> > > problem here.
> 
> Why is one sva_FD per iommu domain required? The HW can attach the
> same PASID to multiple iommu domains, right?
> 
> > > In the end we just realized that doing such generalization doesn't
> > > really lead to a clear design and instead requires tight coordination
> > > between /dev/sva and VFIO/VDPA for almost every new uAPI
> > > (especially about synchronization when the domain/device
> > > association is changed or when the device/subdevice is being reset/
> > > drained). Finally it may become a usability burden to the userspace
> > > on proper use of the two interfaces on the assigned device.
> 
> If you have a list of things that needs to be done to attach a PCI
> device to a PASID then of course they should be tidy kernel APIs
> already, and not just hard wired into VFIO.
> 
> The worst outcome would be to have VDPA and VFIO have to different
> ways to do all of this with a different set of bugs. Bug fixes/new
> features in VFIO won't flow over to VDPA.
> 
> Jason

WARNING: multiple messages have this Message-ID (diff)
From: "Liu, Yi L" <yi.l.liu@intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>, "Tian, Kevin" <kevin.tian@intel.com>
Cc: "Tian, Jun J" <jun.j.tian@intel.com>,
	"Raj, Ashok" <ashok.raj@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"jean-philippe@linaro.org" <jean-philippe@linaro.org>,
	"stefanha@gmail.com" <stefanha@gmail.com>,
	Jason Wang <jasowang@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Sun, Yi Y" <yi.y.sun@intel.com>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>, "Wu, Hao" <hao.wu@intel.com>
Subject: RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Date: Mon, 19 Oct 2020 08:39:03 +0000	[thread overview]
Message-ID: <DM5PR11MB1435A3AEC0637C4531F2FE92C31E0@DM5PR11MB1435.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20201016153632.GM6219@nvidia.com>

Hi Jason,

Good to see your response.

> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Friday, October 16, 2020 11:37 PM
> 
> On Wed, Oct 14, 2020 at 03:16:22AM +0000, Tian, Kevin wrote:
> > Hi, Alex and Jason (G),
> >
> > How about your opinion for this new proposal? For now looks both
> > Jason (W) and Jean are OK with this direction and more discussions
> > are possibly required for the new /dev/ioasid interface. Internally
> > we're doing a quick prototype to see any unforeseen issue with this
> > separation.
> 
> Assuming VDPA and VFIO will be the only two users so duplicating
> everything only twice sounds pretty restricting to me.
> 
> > > Second, IOMMU nested translation is a per IOMMU domain
> > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > > etc.), reporting/enabling the nesting capability is an natural
> > > extension to the domain uAPI of existing passthrough frameworks.
> > > Actually, VFIO already includes a nesting enable interface even
> > > before this series. So it doesn't make sense to generalize this uAPI
> > > out.
> 
> The subsystem that obtains an IOMMU domain for a device would have to
> register it with an open FD of the '/dev/sva'. That is the connection
> between the two subsystems. It would be some simple kernel internal
> stuff:
> 
>   sva = get_sva_from_file(fd);

Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs
which will finally program page table to host iommu driver. As far as I know,
it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after
it sets a proper iommu type to the opened container. VFIO container already
stands for an iommu context with which userspace could program page mapping
to host iommu.

>   sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);

So this is supposed to be called by VFIO/VDPA to register the info to /dev/sva.
right? And in dev/sva, it will also maintain the device/iommu_domain and pasid
info? will it be duplicated with VFIO/VDPA?

> Not sure why this is a roadblock?
> 
> How would this be any different from having some kernel libsva that
> VDPA and VFIO would both rely on?
> 
> You don't plan to just open code all this stuff in VFIO, do you?
> 
> > > Then the tricky part comes with the remaining operations (3/4/5),
> > > which are all backed by iommu_ops thus effective only within an
> > > IOMMU domain. To generalize them, the first thing is to find a way
> > > to associate the sva_FD (opened through generic /dev/sva) with an
> > > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > > to replicate {domain<->device/subdevice} association in /dev/sva
> > > path because some operations (e.g. page fault) is triggered/handled
> > > per device/subdevice. Therefore, /dev/sva must provide both per-
> > > domain and per-device uAPIs similar to what VFIO/VDPA already
> > > does.
> 
> Yes, the point here was to move the general APIs out of VFIO and into
> a sharable location. So, of course one would expect some duplication
> during the transition period.
> 
> > > Moreover, mapping page fault to subdevice requires pre-
> > > registering subdevice fault data to IOMMU layer when binding
> > > guest page table, while such fault data can be only retrieved from
> > > parent driver through VFIO/VDPA.
> 
> Not sure what this means, page fault should be tied to the PASID, any
> hookup needed for that should be done in-kernel when the device is
> connected to the PASID.

you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to
software together with the requestor id of the device. For the page request
injects to guest, it should have the device info.

Regards,
Yi Liu

> 
> > > space but they may be organized in multiple IOMMU domains based
> > > on their bus type. How (should we let) the userspace know the
> > > domain information and open an sva_FD for each domain is the main
> > > problem here.
> 
> Why is one sva_FD per iommu domain required? The HW can attach the
> same PASID to multiple iommu domains, right?
> 
> > > In the end we just realized that doing such generalization doesn't
> > > really lead to a clear design and instead requires tight coordination
> > > between /dev/sva and VFIO/VDPA for almost every new uAPI
> > > (especially about synchronization when the domain/device
> > > association is changed or when the device/subdevice is being reset/
> > > drained). Finally it may become a usability burden to the userspace
> > > on proper use of the two interfaces on the assigned device.
> 
> If you have a list of things that needs to be done to attach a PCI
> device to a PASID then of course they should be tidy kernel APIs
> already, and not just hard wired into VFIO.
> 
> The worst outcome would be to have VDPA and VFIO have to different
> ways to do all of this with a different set of bugs. Bug fixes/new
> features in VFIO won't flow over to VDPA.
> 
> Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2020-10-19  8:39 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-12  8:38 (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs Tian, Kevin
2020-10-12  8:38 ` Tian, Kevin
2020-10-13  6:22 ` Jason Wang
2020-10-13  6:22   ` Jason Wang
2020-10-14  3:08   ` Tian, Kevin
2020-10-14  3:08     ` Tian, Kevin
2020-10-14 23:10     ` Alex Williamson
2020-10-14 23:10       ` Alex Williamson
2020-10-15  7:02       ` Jason Wang
2020-10-15  7:02         ` Jason Wang
2020-10-15  6:52     ` Jason Wang
2020-10-15  6:52       ` Jason Wang
2020-10-15  7:58       ` Tian, Kevin
2020-10-15  7:58         ` Tian, Kevin
2020-10-15  8:40         ` Jason Wang
2020-10-15  8:40           ` Jason Wang
2020-10-15 10:14           ` Liu, Yi L
2020-10-15 10:14             ` Liu, Yi L
2020-10-20  6:18             ` Jason Wang
2020-10-20  6:18               ` Jason Wang
2020-10-20  8:19               ` Liu, Yi L
2020-10-20  8:19                 ` Liu, Yi L
2020-10-20  9:19                 ` Jason Wang
2020-10-20  9:19                   ` Jason Wang
2020-10-20  9:40                   ` Liu, Yi L
2020-10-20  9:40                     ` Liu, Yi L
2020-10-20 13:54                     ` Jason Gunthorpe
2020-10-20 13:54                       ` Jason Gunthorpe
2020-10-20 14:00                       ` Liu, Yi L
2020-10-20 14:00                         ` Liu, Yi L
2020-10-20 14:05                         ` Jason Gunthorpe
2020-10-20 14:05                           ` Jason Gunthorpe
2020-10-20 14:09                           ` Liu, Yi L
2020-10-20 14:09                             ` Liu, Yi L
2020-10-13 10:27 ` Jean-Philippe Brucker
2020-10-13 10:27   ` Jean-Philippe Brucker
2020-10-14  2:11   ` Tian, Kevin
2020-10-14  2:11     ` Tian, Kevin
2020-10-14  3:16 ` Tian, Kevin
2020-10-14  3:16   ` Tian, Kevin
2020-10-16 15:36   ` Jason Gunthorpe
2020-10-16 15:36     ` Jason Gunthorpe
2020-10-19  8:39     ` Liu, Yi L [this message]
2020-10-19  8:39       ` Liu, Yi L
2020-10-19 14:25       ` Jason Gunthorpe
2020-10-19 14:25         ` Jason Gunthorpe
2020-10-20 10:21         ` Liu, Yi L
2020-10-20 10:21           ` Liu, Yi L
2020-10-20 14:02           ` Jason Gunthorpe
2020-10-20 14:02             ` Jason Gunthorpe
2020-10-20 14:19             ` Liu, Yi L
2020-10-20 14:19               ` Liu, Yi L
2020-10-21  2:21               ` Jason Wang
2020-10-21  2:21                 ` Jason Wang
2020-10-20 16:24             ` Raj, Ashok
2020-10-20 16:24               ` Raj, Ashok
2020-10-20 17:03               ` Jason Gunthorpe
2020-10-20 17:03                 ` Jason Gunthorpe
2020-10-20 19:51                 ` Raj, Ashok
2020-10-20 19:51                   ` Raj, Ashok
2020-10-20 19:55                   ` Jason Gunthorpe
2020-10-20 19:55                     ` Jason Gunthorpe
2020-10-20 20:08                     ` Raj, Ashok
2020-10-20 20:08                       ` Raj, Ashok
2020-10-20 20:14                       ` Jason Gunthorpe
2020-10-20 20:14                         ` Jason Gunthorpe
2020-10-20 20:27                         ` Raj, Ashok
2020-10-20 20:27                           ` Raj, Ashok
2020-10-21 11:48                           ` Jason Gunthorpe
2020-10-21 11:48                             ` Jason Gunthorpe
2020-10-21 17:51                             ` Raj, Ashok
2020-10-21 17:51                               ` Raj, Ashok
2020-10-21 18:24                               ` Jason Gunthorpe
2020-10-21 18:24                                 ` Jason Gunthorpe
2020-10-21 20:03                                 ` Raj, Ashok
2020-10-21 20:03                                   ` Raj, Ashok
2020-10-21 23:32                                   ` Jason Gunthorpe
2020-10-21 23:32                                     ` Jason Gunthorpe
2020-10-21 23:53                                     ` Raj, Ashok
2020-10-21 23:53                                       ` Raj, Ashok
2020-10-22  2:55                               ` Jason Wang
2020-10-22  2:55                                 ` Jason Wang
2020-10-22  3:54                                 ` Liu, Yi L
2020-10-22  3:54                                   ` Liu, Yi L
2020-10-22  4:38                                   ` Jason Wang
2020-10-22  4:38                                     ` Jason Wang
2020-11-03  9:52 ` joro
2020-11-03  9:52   ` joro
2020-11-03 12:56   ` Jason Gunthorpe
2020-11-03 12:56     ` Jason Gunthorpe
2020-11-03 13:18     ` joro
2020-11-03 13:18       ` joro
2020-11-03 13:23       ` Jason Gunthorpe
2020-11-03 13:23         ` Jason Gunthorpe
2020-11-03 14:03         ` joro
2020-11-03 14:03           ` joro
2020-11-03 14:06           ` Jason Gunthorpe
2020-11-03 14:06             ` Jason Gunthorpe
2020-11-03 14:35             ` joro
2020-11-03 14:35               ` joro
2020-11-03 15:22               ` Jason Gunthorpe
2020-11-03 15:22                 ` Jason Gunthorpe
2020-11-03 16:55                 ` joro
2020-11-03 16:55                   ` joro
2020-11-03 17:48                   ` Jason Gunthorpe
2020-11-03 17:48                     ` Jason Gunthorpe
2020-11-03 19:14                     ` joro
2020-11-03 19:14                       ` joro
2020-11-04 19:29                       ` Jason Gunthorpe
2020-11-04 19:29                         ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM5PR11MB1435A3AEC0637C4531F2FE92C31E0@DM5PR11MB1435.namprd11.prod.outlook.com \
    --to=yi.l.liu@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=eric.auger@redhat.com \
    --cc=hao.wu@intel.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@linux.intel.com \
    --cc=jasowang@redhat.com \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jun.j.tian@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=stefanha@gmail.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.