All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Eric Auger <eric.auger@redhat.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Joerg Roedel <joro@8bytes.org>,
	peter.maydell@linaro.org, kvm@vger.kernel.org,
	vivek.gautam@arm.com, kvmarm@lists.cs.columbia.edu,
	eric.auger.pro@gmail.com, ashok.raj@intel.com, maz@kernel.org,
	vsethi@nvidia.com, zhangfei.gao@linaro.org, kevin.tian@intel.com,
	will@kernel.org, alex.williamson@redhat.com,
	wangxingang5@huawei.com, linux-kernel@vger.kernel.org,
	lushenming@huawei.com, iommu@lists.linux-foundation.org,
	robin.murphy@arm.com
Subject: Re: [RFC v16 1/9] iommu: Introduce attach/detach_pasid_table API
Date: Wed, 8 Dec 2021 17:20:39 +0000	[thread overview]
Message-ID: <YbDpZ0pf7XeZcc7z@myrica> (raw)
In-Reply-To: <20211208125616.GN6385@nvidia.com>

On Wed, Dec 08, 2021 at 08:56:16AM -0400, Jason Gunthorpe wrote:
> From a progress perspective I would like to start with simple 'page
> tables in userspace', ie no PASID in this step.
> 
> 'page tables in userspace' means an iommufd ioctl to create an
> iommu_domain where the IOMMU HW is directly travesering a
> device-specific page table structure in user space memory. All the HW
> today implements this by using another iommu_domain to allow the IOMMU
> HW DMA access to user memory - ie nesting or multi-stage or whatever.
> 
> This would come along with some ioctls to invalidate the IOTLB.
> 
> I'm imagining this step as a iommu_group->op->create_user_domain()
> driver callback which will create a new kind of domain with
> domain-unique ops. Ie map/unmap related should all be NULL as those
> are impossible operations.
> 
> From there the usual struct device (ie RID) attach/detatch stuff needs
> to take care of routing DMAs to this iommu_domain.
> 
> Step two would be to add the ability for an iommufd using driver to
> request that a RID&PASID is connected to an iommu_domain. This
> connection can be requested for any kind of iommu_domain, kernel owned
> or user owned.
> 
> I don't quite have an answer how exactly the SMMUv3 vs Intel
> difference in PASID routing should be resolved.

In SMMUv3 the user pgd is always stored in the PASID table (actually
called "context descriptor table" but I want to avoid confusion with the
VT-d "context table"). And to access the PASID table, the SMMUv3 first
translate its GPA into a PA using the stage-2 page table. For userspace to
pass individual pgds to the kernel, as opposed to passing whole PASID
tables, the host kernel needs to reserve GPA space and map it in stage-2,
so it can store the PASID table in there. Userspace manages GPA space.

This would be easy for a single pgd. In this case the PASID table has a
single entry and userspace could just pass one GPA page during
registration. However it isn't easily generalized to full PASID support,
because managing a multi-level PASID table will require runtime GPA
allocation, and that API is awkward. That's why we opted for "attach PASID
table" operation rather than "attach page table" (back then the choice was
easy since VT-d used the same concept).

So I think the simplest way to support nesting is still to have separate
modes of operations depending on the hardware.

Thanks,
Jean

> 
> to get answers I'm hoping to start building some sketch RFCs for these
> different things on iommufd, hopefully in January. I'm looking at user
> page tables, PASID, dirty tracking and userspace IO fault handling as
> the main features iommufd must tackle.
> 
> The purpose of the sketches would be to validate that the HW features
> we want to exposed can work will with the choices the base is making.
> 
> Jason

WARNING: multiple messages have this Message-ID (diff)
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: peter.maydell@linaro.org, kevin.tian@intel.com,
	lushenming@huawei.com, robin.murphy@arm.com, ashok.raj@intel.com,
	kvm@vger.kernel.org, vivek.gautam@arm.com, maz@kernel.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	vsethi@nvidia.com, alex.williamson@redhat.com,
	wangxingang5@huawei.com, zhangfei.gao@linaro.org,
	eric.auger.pro@gmail.com, will@kernel.org,
	kvmarm@lists.cs.columbia.edu
Subject: Re: [RFC v16 1/9] iommu: Introduce attach/detach_pasid_table API
Date: Wed, 8 Dec 2021 17:20:39 +0000	[thread overview]
Message-ID: <YbDpZ0pf7XeZcc7z@myrica> (raw)
In-Reply-To: <20211208125616.GN6385@nvidia.com>

On Wed, Dec 08, 2021 at 08:56:16AM -0400, Jason Gunthorpe wrote:
> From a progress perspective I would like to start with simple 'page
> tables in userspace', ie no PASID in this step.
> 
> 'page tables in userspace' means an iommufd ioctl to create an
> iommu_domain where the IOMMU HW is directly travesering a
> device-specific page table structure in user space memory. All the HW
> today implements this by using another iommu_domain to allow the IOMMU
> HW DMA access to user memory - ie nesting or multi-stage or whatever.
> 
> This would come along with some ioctls to invalidate the IOTLB.
> 
> I'm imagining this step as a iommu_group->op->create_user_domain()
> driver callback which will create a new kind of domain with
> domain-unique ops. Ie map/unmap related should all be NULL as those
> are impossible operations.
> 
> From there the usual struct device (ie RID) attach/detatch stuff needs
> to take care of routing DMAs to this iommu_domain.
> 
> Step two would be to add the ability for an iommufd using driver to
> request that a RID&PASID is connected to an iommu_domain. This
> connection can be requested for any kind of iommu_domain, kernel owned
> or user owned.
> 
> I don't quite have an answer how exactly the SMMUv3 vs Intel
> difference in PASID routing should be resolved.

In SMMUv3 the user pgd is always stored in the PASID table (actually
called "context descriptor table" but I want to avoid confusion with the
VT-d "context table"). And to access the PASID table, the SMMUv3 first
translate its GPA into a PA using the stage-2 page table. For userspace to
pass individual pgds to the kernel, as opposed to passing whole PASID
tables, the host kernel needs to reserve GPA space and map it in stage-2,
so it can store the PASID table in there. Userspace manages GPA space.

This would be easy for a single pgd. In this case the PASID table has a
single entry and userspace could just pass one GPA page during
registration. However it isn't easily generalized to full PASID support,
because managing a multi-level PASID table will require runtime GPA
allocation, and that API is awkward. That's why we opted for "attach PASID
table" operation rather than "attach page table" (back then the choice was
easy since VT-d used the same concept).

So I think the simplest way to support nesting is still to have separate
modes of operations depending on the hardware.

Thanks,
Jean

> 
> to get answers I'm hoping to start building some sketch RFCs for these
> different things on iommufd, hopefully in January. I'm looking at user
> page tables, PASID, dirty tracking and userspace IO fault handling as
> the main features iommufd must tackle.
> 
> The purpose of the sketches would be to validate that the HW features
> we want to exposed can work will with the choices the base is making.
> 
> Jason
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

WARNING: multiple messages have this Message-ID (diff)
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: kevin.tian@intel.com, lushenming@huawei.com,
	robin.murphy@arm.com, ashok.raj@intel.com, kvm@vger.kernel.org,
	vivek.gautam@arm.com, maz@kernel.org,
	Joerg Roedel <joro@8bytes.org>,
	linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
	vsethi@nvidia.com, alex.williamson@redhat.com,
	wangxingang5@huawei.com, zhangfei.gao@linaro.org,
	eric.auger.pro@gmail.com, will@kernel.org,
	kvmarm@lists.cs.columbia.edu, Lu Baolu <baolu.lu@linux.intel.com>
Subject: Re: [RFC v16 1/9] iommu: Introduce attach/detach_pasid_table API
Date: Wed, 8 Dec 2021 17:20:39 +0000	[thread overview]
Message-ID: <YbDpZ0pf7XeZcc7z@myrica> (raw)
In-Reply-To: <20211208125616.GN6385@nvidia.com>

On Wed, Dec 08, 2021 at 08:56:16AM -0400, Jason Gunthorpe wrote:
> From a progress perspective I would like to start with simple 'page
> tables in userspace', ie no PASID in this step.
> 
> 'page tables in userspace' means an iommufd ioctl to create an
> iommu_domain where the IOMMU HW is directly travesering a
> device-specific page table structure in user space memory. All the HW
> today implements this by using another iommu_domain to allow the IOMMU
> HW DMA access to user memory - ie nesting or multi-stage or whatever.
> 
> This would come along with some ioctls to invalidate the IOTLB.
> 
> I'm imagining this step as a iommu_group->op->create_user_domain()
> driver callback which will create a new kind of domain with
> domain-unique ops. Ie map/unmap related should all be NULL as those
> are impossible operations.
> 
> From there the usual struct device (ie RID) attach/detatch stuff needs
> to take care of routing DMAs to this iommu_domain.
> 
> Step two would be to add the ability for an iommufd using driver to
> request that a RID&PASID is connected to an iommu_domain. This
> connection can be requested for any kind of iommu_domain, kernel owned
> or user owned.
> 
> I don't quite have an answer how exactly the SMMUv3 vs Intel
> difference in PASID routing should be resolved.

In SMMUv3 the user pgd is always stored in the PASID table (actually
called "context descriptor table" but I want to avoid confusion with the
VT-d "context table"). And to access the PASID table, the SMMUv3 first
translate its GPA into a PA using the stage-2 page table. For userspace to
pass individual pgds to the kernel, as opposed to passing whole PASID
tables, the host kernel needs to reserve GPA space and map it in stage-2,
so it can store the PASID table in there. Userspace manages GPA space.

This would be easy for a single pgd. In this case the PASID table has a
single entry and userspace could just pass one GPA page during
registration. However it isn't easily generalized to full PASID support,
because managing a multi-level PASID table will require runtime GPA
allocation, and that API is awkward. That's why we opted for "attach PASID
table" operation rather than "attach page table" (back then the choice was
easy since VT-d used the same concept).

So I think the simplest way to support nesting is still to have separate
modes of operations depending on the hardware.

Thanks,
Jean

> 
> to get answers I'm hoping to start building some sketch RFCs for these
> different things on iommufd, hopefully in January. I'm looking at user
> page tables, PASID, dirty tracking and userspace IO fault handling as
> the main features iommufd must tackle.
> 
> The purpose of the sketches would be to validate that the HW features
> we want to exposed can work will with the choices the base is making.
> 
> Jason
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

  reply	other threads:[~2021-12-08 17:21 UTC|newest]

Thread overview: 116+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-27 10:44 [RFC v16 0/9] SMMUv3 Nested Stage Setup (IOMMU part) Eric Auger
2021-10-27 10:44 ` Eric Auger
2021-10-27 10:44 ` Eric Auger
2021-10-27 10:44 ` [RFC v16 1/9] iommu: Introduce attach/detach_pasid_table API Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-12-06 10:48   ` Joerg Roedel
2021-12-06 10:48     ` Joerg Roedel
2021-12-06 10:48     ` Joerg Roedel
2021-12-07 10:22     ` Eric Auger
2021-12-07 10:22       ` Eric Auger
2021-12-07 10:22       ` Eric Auger
2021-12-08  2:44       ` Lu Baolu
2021-12-08  2:44         ` Lu Baolu
2021-12-08  2:44         ` Lu Baolu
2021-12-08  7:33         ` Eric Auger
2021-12-08  7:33           ` Eric Auger
2021-12-08  7:33           ` Eric Auger
2021-12-08 12:56           ` Jason Gunthorpe
2021-12-08 12:56             ` Jason Gunthorpe
2021-12-08 12:56             ` Jason Gunthorpe via iommu
2021-12-08 17:20             ` Jean-Philippe Brucker [this message]
2021-12-08 17:20               ` Jean-Philippe Brucker
2021-12-08 17:20               ` Jean-Philippe Brucker
2021-12-08 18:31               ` Jason Gunthorpe
2021-12-08 18:31                 ` Jason Gunthorpe
2021-12-08 18:31                 ` Jason Gunthorpe via iommu
2021-12-09  2:58                 ` Tian, Kevin
2021-12-09  2:58                   ` Tian, Kevin
2021-12-09  2:58                   ` Tian, Kevin
     [not found]                 ` <BN9PR11MB527624080CB9302481B74C7A8C709@BN9PR11MB5276.namprd11.prod.outlook.com>
2021-12-09  3:59                   ` Tian, Kevin
2021-12-09  3:59                     ` Tian, Kevin
2021-12-09  3:59                     ` Tian, Kevin
2021-12-09 16:08                     ` Jason Gunthorpe
2021-12-09 16:08                       ` Jason Gunthorpe
2021-12-09 16:08                       ` Jason Gunthorpe via iommu
2021-12-10  8:56                       ` Tian, Kevin
2021-12-10  8:56                         ` Tian, Kevin
2021-12-10 13:23                         ` Jason Gunthorpe
2021-12-10 13:23                           ` Jason Gunthorpe
2021-12-11  3:57                           ` Tian, Kevin
2021-12-11  3:57                             ` Tian, Kevin
2021-12-16 20:48                             ` Jason Gunthorpe
2021-12-16 20:48                               ` Jason Gunthorpe
2022-01-04  2:42                               ` Tian, Kevin
2022-01-04  2:42                                 ` Tian, Kevin
2021-12-11  5:18                           ` Tian, Kevin
2021-12-11  5:18                             ` Tian, Kevin
2021-12-09  7:50                 ` Eric Auger
2021-12-09  7:50                   ` Eric Auger
2021-12-09  7:50                   ` Eric Auger
2021-12-09 15:40                   ` Jason Gunthorpe
2021-12-09 15:40                     ` Jason Gunthorpe
2021-12-09 15:40                     ` Jason Gunthorpe via iommu
2021-12-09 16:37                     ` Eric Auger
2021-12-09 16:37                       ` Eric Auger
2021-12-09 16:37                       ` Eric Auger
2021-12-09  3:21             ` Tian, Kevin
2021-12-09  3:21               ` Tian, Kevin
2021-12-09  3:21               ` Tian, Kevin
2021-12-09  9:44               ` Eric Auger
2021-12-09  9:44                 ` Eric Auger
2021-12-09  9:44                 ` Eric Auger
2021-12-09  8:31             ` Eric Auger
2021-12-09  8:31               ` Eric Auger
2021-12-09  8:31               ` Eric Auger
2021-10-27 10:44 ` [RFC v16 2/9] iommu: Introduce iommu_get_nesting Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 22:15   ` kernel test robot
2021-10-28  3:22   ` kernel test robot
2021-10-27 10:44 ` [RFC v16 3/9] iommu/smmuv3: Allow s1 and s2 configs to coexist Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44 ` [RFC v16 4/9] iommu/smmuv3: Get prepared for nested stage support Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44 ` [RFC v16 5/9] iommu/smmuv3: Implement attach/detach_pasid_table Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44 ` [RFC v16 6/9] iommu/smmuv3: Allow stage 1 invalidation with unmanaged ASIDs Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44 ` [RFC v16 7/9] iommu/smmuv3: Implement cache_invalidate Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44 ` [RFC v16 8/9] iommu/smmuv3: report additional recoverable faults Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 21:05   ` kernel test robot
2021-10-27 22:41   ` kernel test robot
2021-10-27 22:41     ` kernel test robot
2021-10-27 10:44 ` [RFC v16 9/9] iommu/smmuv3: Disallow nested mode in presence of HW MSI regions Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-10-27 10:44   ` Eric Auger
2021-12-03 12:27 ` [RFC v16 0/9] SMMUv3 Nested Stage Setup (IOMMU part) Zhangfei Gao
2021-12-03 12:27   ` Zhangfei Gao
2021-12-03 12:27   ` Zhangfei Gao
2021-12-07 10:27   ` Eric Auger
2021-12-07 10:27     ` Eric Auger
2021-12-07 10:27     ` Eric Auger
2021-12-07 10:35     ` Zhangfei Gao
2021-12-07 10:35       ` Zhangfei Gao
2021-12-07 10:35       ` Zhangfei Gao
2021-12-07 11:06       ` Eric Auger
2021-12-07 11:06         ` Eric Auger
2021-12-07 11:06         ` Eric Auger
2021-12-08 13:33         ` Shameerali Kolothum Thodi
2021-12-08 13:33           ` Shameerali Kolothum Thodi
2021-12-08 13:33           ` Shameerali Kolothum Thodi via iommu
2021-12-03 13:13 ` Sumit Gupta
2021-12-03 13:13   ` Sumit Gupta
2021-12-03 13:13   ` Sumit Gupta via iommu
2021-12-07 10:28   ` Eric Auger
2021-12-07 10:28     ` Eric Auger
2021-12-07 10:28     ` Eric Auger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbDpZ0pf7XeZcc7z@myrica \
    --to=jean-philippe@linaro.org \
    --cc=alex.williamson@redhat.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=eric.auger.pro@gmail.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lushenming@huawei.com \
    --cc=maz@kernel.org \
    --cc=peter.maydell@linaro.org \
    --cc=robin.murphy@arm.com \
    --cc=vivek.gautam@arm.com \
    --cc=vsethi@nvidia.com \
    --cc=wangxingang5@huawei.com \
    --cc=will@kernel.org \
    --cc=zhangfei.gao@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.