iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/12]  x86: tag application address space for devices
@ 2020-07-13 23:47 Fenghua Yu
  2020-07-13 23:47 ` [PATCH v6 01/12] iommu: Change type of pasid to u32 Fenghua Yu
                   ` (11 more replies)
  0 siblings, 12 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:47 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

Hi, Thomas, Joerg, and other maintainers,

This series only has one change in patch 1 on top of v5 (see change log).
Could you please consider to merge it upstream?

Thanks.

-Fenghua

=====

Typical hardware devices require a driver stack to translate application
buffers to hardware addresses, and a kernel-user transition to notify the
hardware of new work. What if both the translation and transition overhead
could be eliminated? This is what Shared Virtual Address (SVA) and ENQCMD
enabled hardware like Data Streaming Accelerator (DSA) aims to achieve.
Applications map portals in their local-address-space and directly submit
work to them using a new instruction.

This series enables ENQCMD and associated management of the new MSR
(MSR_IA32_PASID). This new MSR allows an application address space to be
associated with what the PCIe spec calls a Process Address Space ID (PASID).
This PASID tag is carried along with all requests between applications and
devices and allows devices to interact with the process address space.

SVA and ENQCMD enabled device drivers need this series. The phase 2 DSA
patches with SVA and ENQCMD support was released on the top of this series:
https://lore.kernel.org/patchwork/cover/1244060/

This series only provides simple and basic support for ENQCMD and the MSR:
1. Clean up type definitions (patch 1-2). These patches can be in a
   separate series.
   - Define "pasid" as "u32" consistently
   - Define "flags" as "unsigned int"
2. Explain different various technical terms used in the series (patch 3).
3. Enumerate support for ENQCMD in the processor (patch 4).
4. Handle FPU PASID state and the MSR during context switch (patches 5-6).
5. Define "pasid" in mm_struct (patch 7).
5. Clear PASID state for new mm and forked and cloned thread (patch 8-9).
6. Allocate and free PASID for a process (patch 10).
7. Fix up the PASID MSR in #GP handler when one thread in a process
   executes ENQCMD for the first time (patches 11-12).

This patch series and the DSA phase 2 series are in
https://github.com/intel/idxd-driver/tree/idxd-stage2

References:
1. Detailed information on the ENQCMD/ENQCMDS instructions and the
IA32_PASID MSR can be found in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

2. Detailed information on DSA can be found in DSA specification:
https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

Chang log:
v6:
- Change return type to u32 for kfd_pasid_alloc() in patch 1 (Felix)

v5:
- Mark ENQCMD disabled when configured out and use cpu_feature_enabled()
  to simplify the feature checking code in patch 10 and 12 (PeterZ and
  Dave Hansen)
- Add Reviewed-by: Lu Baolu to patch 1, 2, 10, and 12.

v4:
- Define PASID as "u32" instead of "unsigned int" in patch 1, 7, 10, 12.
  (Christoph)
- Drop v3 patch 2 which changes PASID type in ocxl because it's not related
  to x86 and was rejected by ocxl maintainer Frederic Barrat
- A split patch which changes PASID type to u32 in crypto/hisilicon/qm.c
  was released separately to linux-crypto mailing list because it's not
  related to x86 and is a standalone patch:

v3:
- Change names of bind_mm() and unbind_mm() to match to new APIs in
  patch 4 (Baolu)
- Change CONFIG_PCI_PASID to CONFIG_IOMMU_SUPPORT because non-PCI device
  can have PASID in ARM in patch 8 (Jean)
- Add a few sanity checks in __free_pasid() and alloc_pasid() in
  patch 11 (Baolu)
- Add patch 12 to define a new flag "has_valid_pasid" for a task and
  use the flag to identify if the task has a valid PASID MSR (PeterZ)
- Add fpu__pasid_write() to update the MSR in fixup() in patch 13
- Check if mm->pasid can be found in fixup() in patch 13

v2:
- Add patches 1-3 to define "pasid" and "flags" as "unsigned int"
  consistently (Thomas)
  (these 3 patches could be in a separate patch set)
- Add patch 8 to move "pasid" to generic mm_struct (Christoph).
  Jean-Philippe Brucker released a virtually same patch. Upstream only
  needs one of the two.
- Add patch 9 to initialize PASID in a new mm.
- Plus other changes described in each patch (Thomas)

Ashok Raj (1):
  docs: x86: Add documentation for SVA (Shared Virtual Addressing)

Fenghua Yu (9):
  iommu: Change type of pasid to u32
  iommu/vt-d: Change flags type to unsigned int in binding mm
  x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
  x86/msr-index: Define IA32_PASID MSR
  mm: Define pasid in mm
  fork: Clear PASID for new mm
  x86/process: Clear PASID state for a newly forked/cloned thread
  x86/mmu: Allocate/free PASID
  x86/traps: Fix up invalid PASID

Peter Zijlstra (1):
  sched: Define and initialize a flag to identify valid PASID in the
    task

Yu-cheng Yu (1):
  x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

 Documentation/x86/index.rst                   |   1 +
 Documentation/x86/sva.rst                     | 287 ++++++++++++++++++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/disabled-features.h      |   9 +-
 arch/x86/include/asm/fpu/types.h              |  10 +
 arch/x86/include/asm/fpu/xstate.h             |   2 +-
 arch/x86/include/asm/iommu.h                  |   3 +
 arch/x86/include/asm/mmu_context.h            |  11 +
 arch/x86/include/asm/msr-index.h              |   3 +
 arch/x86/kernel/cpu/cpuid-deps.c              |   1 +
 arch/x86/kernel/fpu/xstate.c                  |   4 +
 arch/x86/kernel/process.c                     |  18 ++
 arch/x86/kernel/traps.c                       |  12 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |   4 +-
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c    |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |   2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c       |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h       |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |   8 +-
 .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c       |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h       |   2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c       |   8 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.h       |   4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c        |   6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_pasid.c        |   4 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  20 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   2 +-
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |   2 +-
 drivers/iommu/amd/amd_iommu.h                 |  10 +-
 drivers/iommu/amd/iommu.c                     |  31 +-
 drivers/iommu/amd/iommu_v2.c                  |  20 +-
 drivers/iommu/intel/dmar.c                    |   7 +-
 drivers/iommu/intel/intel-pasid.h             |  24 +-
 drivers/iommu/intel/iommu.c                   |   4 +-
 drivers/iommu/intel/pasid.c                   |  31 +-
 drivers/iommu/intel/svm.c                     | 225 ++++++++++++--
 drivers/iommu/iommu.c                         |   2 +-
 drivers/misc/uacce/uacce.c                    |   2 +-
 include/linux/amd-iommu.h                     |   8 +-
 include/linux/intel-iommu.h                   |  14 +-
 include/linux/intel-svm.h                     |   2 +-
 include/linux/iommu.h                         |  10 +-
 include/linux/mm_types.h                      |   6 +
 include/linux/sched.h                         |   3 +
 include/linux/uacce.h                         |   2 +-
 kernel/fork.c                                 |  12 +
 54 files changed, 721 insertions(+), 159 deletions(-)
 create mode 100644 Documentation/x86/sva.rst

-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v6 01/12] iommu: Change type of pasid to u32
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
@ 2020-07-13 23:47 ` Fenghua Yu
  2020-07-14  2:45   ` Liu, Yi L
  2020-07-22 14:03   ` Joerg Roedel
  2020-07-13 23:47 ` [PATCH v6 02/12] iommu/vt-d: Change flags type to unsigned int in binding mm Fenghua Yu
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:47 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with uapi
definitions, define PASID and its variations (e.g. max PASID) as "u32".
"u32" is also shorter and a little more explicit than "unsigned int".

No PASID type change in uapi although it defines PASID as __u64 in
some places.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
v6:
- Change return type to u32 for kfd_pasid_alloc() (Felix)

v5:
- Reviewed by Lu Baolu

v4:
- Change PASID type from "unsigned int" to "u32" (Christoph)

v2:
- Create this new patch to define PASID as "unsigned int" consistently in
  iommu (Thomas)

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 +--
 .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c    |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  2 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c       |  6 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h       |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  8 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |  8 ++---
 .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c       |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h       |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_events.c       |  8 ++---
 drivers/gpu/drm/amd/amdkfd/kfd_events.h       |  4 +--
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c        |  6 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_pasid.c        |  4 +--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         | 20 ++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      |  2 +-
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |  2 +-
 drivers/iommu/amd/amd_iommu.h                 | 10 +++---
 drivers/iommu/amd/iommu.c                     | 31 ++++++++++---------
 drivers/iommu/amd/iommu_v2.c                  | 20 ++++++------
 drivers/iommu/intel/dmar.c                    |  7 +++--
 drivers/iommu/intel/intel-pasid.h             | 24 +++++++-------
 drivers/iommu/intel/iommu.c                   |  4 +--
 drivers/iommu/intel/pasid.c                   | 31 +++++++++----------
 drivers/iommu/intel/svm.c                     | 12 +++----
 drivers/iommu/iommu.c                         |  2 +-
 drivers/misc/uacce/uacce.c                    |  2 +-
 include/linux/amd-iommu.h                     |  8 ++---
 include/linux/intel-iommu.h                   | 12 +++----
 include/linux/intel-svm.h                     |  2 +-
 include/linux/iommu.h                         | 10 +++---
 include/linux/uacce.h                         |  2 +-
 38 files changed, 141 insertions(+), 141 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ffe149aafc39..dfef5a7e0f5a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -207,11 +207,11 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct kgd_dev *dst, struct kgd_dev *s
 	})
 
 /* GPUVM API */
-int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
 					void **vm, void **process_info,
 					struct dma_fence **ef);
 int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
-					struct file *filp, unsigned int pasid,
+					struct file *filp, u32 pasid,
 					void **vm, void **process_info,
 					struct dma_fence **ef);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
index bf927f432506..ee531c3988d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
@@ -105,7 +105,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
 	unlock_srbm(kgd);
 }
 
-static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
 					unsigned int vmid)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 744366c7ee85..4d41317b9292 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -139,7 +139,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
 	unlock_srbm(kgd);
 }
 
-static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
 					unsigned int vmid)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index feab4cc6e836..35917d4b50f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -96,7 +96,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
 	unlock_srbm(kgd);
 }
 
-static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
 					unsigned int vmid)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index c7fd0c47b254..b4b88e4da4de 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -110,7 +110,7 @@ void kgd_gfx_v9_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
 	unlock_srbm(kgd);
 }
 
-int kgd_gfx_v9_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+int kgd_gfx_v9_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
 					unsigned int vmid)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
index aedf67d57449..ff2bc72e6646 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
@@ -26,7 +26,7 @@ void kgd_gfx_v9_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
 		uint32_t sh_mem_config,
 		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit,
 		uint32_t sh_mem_bases);
-int kgd_gfx_v9_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+int kgd_gfx_v9_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
 		unsigned int vmid);
 int kgd_gfx_v9_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
 int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index b91b5171270f..9d06366320bc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -994,7 +994,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
 	return ret;
 }
 
-int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasid,
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
 					  void **vm, void **process_info,
 					  struct dma_fence **ef)
 {
@@ -1030,7 +1030,7 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int pasi
 }
 
 int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
-					   struct file *filp, unsigned int pasid,
+					   struct file *filp, u32 pasid,
 					   void **vm, void **process_info,
 					   struct dma_fence **ef)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index fe92dcd94d4a..3f697e9010d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -43,7 +43,7 @@ static DEFINE_IDA(amdgpu_pasid_ida);
 /* Helper to free pasid from a fence callback */
 struct amdgpu_pasid_cb {
 	struct dma_fence_cb cb;
-	unsigned int pasid;
+	u32 pasid;
 };
 
 /**
@@ -79,7 +79,7 @@ int amdgpu_pasid_alloc(unsigned int bits)
  * amdgpu_pasid_free - Free a PASID
  * @pasid: PASID to free
  */
-void amdgpu_pasid_free(unsigned int pasid)
+void amdgpu_pasid_free(u32 pasid)
 {
 	trace_amdgpu_pasid_freed(pasid);
 	ida_simple_remove(&amdgpu_pasid_ida, pasid);
@@ -105,7 +105,7 @@ static void amdgpu_pasid_free_cb(struct dma_fence *fence,
  * Free the pasid only after all the fences in resv are signaled.
  */
 void amdgpu_pasid_free_delayed(struct dma_resv *resv,
-			       unsigned int pasid)
+			       u32 pasid)
 {
 	struct dma_fence *fence, **fences;
 	struct amdgpu_pasid_cb *cb;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
index 8e58325bbca2..0c3b4fa1f936 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
@@ -71,9 +71,9 @@ struct amdgpu_vmid_mgr {
 };
 
 int amdgpu_pasid_alloc(unsigned int bits);
-void amdgpu_pasid_free(unsigned int pasid);
+void amdgpu_pasid_free(u32 pasid);
 void amdgpu_pasid_free_delayed(struct dma_resv *resv,
-			       unsigned int pasid);
+			       u32 pasid);
 
 bool amdgpu_vmid_had_gpu_reset(struct amdgpu_device *adev,
 			       struct amdgpu_vmid *id);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index d7e17e34fee1..8503cce467eb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1062,7 +1062,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
 	struct amdgpu_bo_list *list;
 	struct amdgpu_bo *pd;
-	unsigned int pasid;
+	u32 pasid;
 	int handle;
 
 	if (!fpriv)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 7417754e9141..ada2534dd8d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2783,7 +2783,7 @@ long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long timeout)
  * 0 for success, error for failure.
  */
 int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
-		   int vm_context, unsigned int pasid)
+		   int vm_context, u32 pasid)
 {
 	struct amdgpu_bo_param bp;
 	struct amdgpu_bo *root;
@@ -2954,7 +2954,7 @@ static int amdgpu_vm_check_clean_reserved(struct amdgpu_device *adev,
  * 0 for success, -errno for errors.
  */
 int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm,
-			   unsigned int pasid)
+			   u32 pasid)
 {
 	bool pte_support_ats = (adev->asic_type == CHIP_RAVEN);
 	int r;
@@ -3252,7 +3252,7 @@ int amdgpu_vm_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
  * @pasid: PASID identifier for VM
  * @task_info: task_info to fill.
  */
-void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
+void amdgpu_vm_get_task_info(struct amdgpu_device *adev, u32 pasid,
 			 struct amdgpu_task_info *task_info)
 {
 	struct amdgpu_vm *vm;
@@ -3296,7 +3296,7 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
  * Try to gracefully handle a VM fault. Return true if the fault was handled and
  * shouldn't be reported any more.
  */
-bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
+bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
 			    uint64_t addr)
 {
 	struct amdgpu_bo *root;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index c8e68d7890bf..c2f23b071f14 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -371,8 +371,8 @@ void amdgpu_vm_manager_fini(struct amdgpu_device *adev);
 
 long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long timeout);
 int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
-		   int vm_context, unsigned int pasid);
-int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm, unsigned int pasid);
+		   int vm_context, u32 pasid);
+int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm, u32 pasid);
 void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
@@ -429,9 +429,9 @@ bool amdgpu_vm_need_pipeline_sync(struct amdgpu_ring *ring,
 				  struct amdgpu_job *job);
 void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
 
-void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
+void amdgpu_vm_get_task_info(struct amdgpu_device *adev, u32 pasid,
 			     struct amdgpu_task_info *task_info);
-bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
+bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
 			    uint64_t addr);
 
 void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
diff --git a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
index 9f59ba93cfe0..c2fd30045ad5 100644
--- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
+++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
@@ -90,7 +90,7 @@ static void cik_event_interrupt_wq(struct kfd_dev *dev,
 			(const struct cik_ih_ring_entry *)ih_ring_entry;
 	uint32_t context_id = ihre->data & 0xfffffff;
 	unsigned int vmid  = (ihre->ring_id & 0x0000ff00) >> 8;
-	unsigned int pasid = (ihre->ring_id & 0xffff0000) >> 16;
+	u32 pasid = (ihre->ring_id & 0xffff0000) >> 16;
 
 	if (pasid == 0)
 		return;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 27bcc5b472f6..b258a3dae767 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -45,7 +45,7 @@ static void dbgdev_address_watch_disable_nodiq(struct kfd_dev *dev)
 }
 
 static int dbgdev_diq_submit_ib(struct kfd_dbgdev *dbgdev,
-				unsigned int pasid, uint64_t vmid0_address,
+				u32 pasid, uint64_t vmid0_address,
 				uint32_t *packet_buff, size_t size_in_bytes)
 {
 	struct pm4__release_mem *rm_packet;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h b/drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h
index a04a1fe1d0d9..f9c6df1fdc5c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h
@@ -275,7 +275,7 @@ struct kfd_dbgdev {
 };
 
 struct kfd_dbgmgr {
-	unsigned int pasid;
+	u32 pasid;
 	struct kfd_dev *dev;
 	struct kfd_dbgdev *dbgdev;
 };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e9c4867abeff..9571a6e5de4c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -40,7 +40,7 @@
 #define CIK_HPD_EOP_BYTES (1U << CIK_HPD_EOP_BYTES_LOG2)
 
 static int set_pasid_vmid_mapping(struct device_queue_manager *dqm,
-					unsigned int pasid, unsigned int vmid);
+				  u32 pasid, unsigned int vmid);
 
 static int execute_queues_cpsch(struct device_queue_manager *dqm,
 				enum kfd_unmap_queues_filter filter,
@@ -909,7 +909,7 @@ static int unregister_process(struct device_queue_manager *dqm,
 }
 
 static int
-set_pasid_vmid_mapping(struct device_queue_manager *dqm, unsigned int pasid,
+set_pasid_vmid_mapping(struct device_queue_manager *dqm, u32 pasid,
 			unsigned int vmid)
 {
 	return dqm->dev->kfd2kgd->set_pasid_vmid_mapping(
@@ -1925,8 +1925,7 @@ void device_queue_manager_uninit(struct device_queue_manager *dqm)
 	kfree(dqm);
 }
 
-int kfd_process_vm_fault(struct device_queue_manager *dqm,
-			 unsigned int pasid)
+int kfd_process_vm_fault(struct device_queue_manager *dqm, u32 pasid)
 {
 	struct kfd_process_device *pdd;
 	struct kfd_process *p = kfd_lookup_process_by_pasid(pasid);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index a9583b95fcc1..ba2c2ce0c55a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -460,7 +460,7 @@ static void set_event_from_interrupt(struct kfd_process *p,
 	}
 }
 
-void kfd_signal_event_interrupt(unsigned int pasid, uint32_t partial_id,
+void kfd_signal_event_interrupt(u32 pasid, uint32_t partial_id,
 				uint32_t valid_id_bits)
 {
 	struct kfd_event *ev = NULL;
@@ -872,7 +872,7 @@ static void lookup_events_by_type_and_signal(struct kfd_process *p,
 }
 
 #ifdef KFD_SUPPORT_IOMMU_V2
-void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
+void kfd_signal_iommu_event(struct kfd_dev *dev, u32 pasid,
 		unsigned long address, bool is_write_requested,
 		bool is_execute_requested)
 {
@@ -950,7 +950,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
 }
 #endif /* KFD_SUPPORT_IOMMU_V2 */
 
-void kfd_signal_hw_exception_event(unsigned int pasid)
+void kfd_signal_hw_exception_event(u32 pasid)
 {
 	/*
 	 * Because we are called from arbitrary context (workqueue) as opposed
@@ -971,7 +971,7 @@ void kfd_signal_hw_exception_event(unsigned int pasid)
 	kfd_unref_process(p);
 }
 
-void kfd_signal_vm_fault_event(struct kfd_dev *dev, unsigned int pasid,
+void kfd_signal_vm_fault_event(struct kfd_dev *dev, u32 pasid,
 				struct kfd_vm_fault_info *info)
 {
 	struct kfd_event *ev;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.h b/drivers/gpu/drm/amd/amdkfd/kfd_events.h
index c7ac6c73af86..c8fe5dbdad55 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.h
@@ -79,7 +79,7 @@ struct kfd_event {
 #define KFD_EVENT_TYPE_DEBUG 5
 #define KFD_EVENT_TYPE_MEMORY 8
 
-extern void kfd_signal_event_interrupt(unsigned int pasid, uint32_t partial_id,
-					uint32_t valid_id_bits);
+extern void kfd_signal_event_interrupt(u32 pasid, uint32_t partial_id,
+				       uint32_t valid_id_bits);
 
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
index 7c8786b9eb0a..e8ef3886688b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -139,7 +139,7 @@ void kfd_iommu_unbind_process(struct kfd_process *p)
 }
 
 /* Callback for process shutdown invoked by the IOMMU driver */
-static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
+static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, u32 pasid)
 {
 	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
 	struct kfd_process *p;
@@ -185,8 +185,8 @@ static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
 }
 
 /* This function called by IOMMU driver on PPR failure */
-static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
-		unsigned long address, u16 flags)
+static int iommu_invalid_ppr_cb(struct pci_dev *pdev, u32 pasid,
+				unsigned long address, u16 flags)
 {
 	struct kfd_dev *dev;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c b/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
index 33b08ff00b50..c19a2e6fd7c8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
@@ -51,7 +51,7 @@ unsigned int kfd_get_pasid_limit(void)
 	return 1U << pasid_bits;
 }
 
-unsigned int kfd_pasid_alloc(void)
+u32 kfd_pasid_alloc(void)
 {
 	int r;
 
@@ -77,7 +77,7 @@ unsigned int kfd_pasid_alloc(void)
 	return r > 0 ? r : 0;
 }
 
-void kfd_pasid_free(unsigned int pasid)
+void kfd_pasid_free(u32 pasid)
 {
 	if (kfd2kgd)
 		amdgpu_pasid_free(pasid);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index fee60921fccf..09db037bb48f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -715,7 +715,7 @@ struct kfd_process {
 	/* We want to receive a notification when the mm_struct is destroyed */
 	struct mmu_notifier mmu_notifier;
 
-	uint16_t pasid;
+	u32 pasid;
 	unsigned int doorbell_index;
 
 	/*
@@ -790,7 +790,7 @@ int kfd_process_create_wq(void);
 void kfd_process_destroy_wq(void);
 struct kfd_process *kfd_create_process(struct file *filep);
 struct kfd_process *kfd_get_process(const struct task_struct *);
-struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid);
+struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid);
 struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm);
 void kfd_unref_process(struct kfd_process *p);
 int kfd_process_evict_queues(struct kfd_process *p);
@@ -831,8 +831,8 @@ int kfd_pasid_init(void);
 void kfd_pasid_exit(void);
 bool kfd_set_pasid_limit(unsigned int new_limit);
 unsigned int kfd_get_pasid_limit(void);
-unsigned int kfd_pasid_alloc(void);
-void kfd_pasid_free(unsigned int pasid);
+u32 kfd_pasid_alloc(void);
+void kfd_pasid_free(u32 pasid);
 
 /* Doorbells */
 size_t kfd_doorbell_process_slice(struct kfd_dev *kfd);
@@ -917,7 +917,7 @@ void device_queue_manager_uninit(struct device_queue_manager *dqm);
 struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
 					enum kfd_queue_type type);
 void kernel_queue_uninit(struct kernel_queue *kq, bool hanging);
-int kfd_process_vm_fault(struct device_queue_manager *dqm, unsigned int pasid);
+int kfd_process_vm_fault(struct device_queue_manager *dqm, u32 pasid);
 
 /* Process Queue Manager */
 struct process_queue_node {
@@ -1039,12 +1039,12 @@ int kfd_wait_on_events(struct kfd_process *p,
 		       uint32_t num_events, void __user *data,
 		       bool all, uint32_t user_timeout_ms,
 		       uint32_t *wait_result);
-void kfd_signal_event_interrupt(unsigned int pasid, uint32_t partial_id,
+void kfd_signal_event_interrupt(u32 pasid, uint32_t partial_id,
 				uint32_t valid_id_bits);
 void kfd_signal_iommu_event(struct kfd_dev *dev,
-		unsigned int pasid, unsigned long address,
-		bool is_write_requested, bool is_execute_requested);
-void kfd_signal_hw_exception_event(unsigned int pasid);
+			    u32 pasid, unsigned long address,
+			    bool is_write_requested, bool is_execute_requested);
+void kfd_signal_hw_exception_event(u32 pasid);
 int kfd_set_event(struct kfd_process *p, uint32_t event_id);
 int kfd_reset_event(struct kfd_process *p, uint32_t event_id);
 int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
@@ -1055,7 +1055,7 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint64_t *event_page_offset, uint32_t *event_slot_index);
 int kfd_event_destroy(struct kfd_process *p, uint32_t event_id);
 
-void kfd_signal_vm_fault_event(struct kfd_dev *dev, unsigned int pasid,
+void kfd_signal_vm_fault_event(struct kfd_dev *dev, u32 pasid,
 				struct kfd_vm_fault_info *info);
 
 void kfd_signal_reset_event(struct kfd_dev *dev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 0e0c42e9f6a3..7567647e976d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1087,7 +1087,7 @@ void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
 }
 
 /* This increments the process->ref counter. */
-struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid)
+struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid)
 {
 	struct kfd_process *p, *ret_p = NULL;
 	unsigned int temp;
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index a3c238c39ef5..301de493377a 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -226,7 +226,7 @@ struct kfd2kgd_calls {
 			uint32_t sh_mem_config,	uint32_t sh_mem_ape1_base,
 			uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
 
-	int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, unsigned int pasid,
+	int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, u32 pasid,
 					unsigned int vmid);
 
 	int (*init_interrupts)(struct kgd_dev *kgd, uint32_t pipe_id);
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index f892992c8744..1fae58dd3c25 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -45,12 +45,12 @@ extern int amd_iommu_register_ppr_notifier(struct notifier_block *nb);
 extern int amd_iommu_unregister_ppr_notifier(struct notifier_block *nb);
 extern void amd_iommu_domain_direct_map(struct iommu_domain *dom);
 extern int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids);
-extern int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
+extern int amd_iommu_flush_page(struct iommu_domain *dom, u32 pasid,
 				u64 address);
-extern int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid);
-extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
+extern int amd_iommu_flush_tlb(struct iommu_domain *dom, u32 pasid);
+extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, u32 pasid,
 				     unsigned long cr3);
-extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid);
+extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, u32 pasid);
 extern struct iommu_domain *amd_iommu_get_v2_domain(struct pci_dev *pdev);
 
 #ifdef CONFIG_IRQ_REMAP
@@ -66,7 +66,7 @@ static inline int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 #define PPR_INVALID			0x1
 #define PPR_FAILURE			0xf
 
-extern int amd_iommu_complete_ppr(struct pci_dev *pdev, int pasid,
+extern int amd_iommu_complete_ppr(struct pci_dev *pdev, u32 pasid,
 				  int status, int tag);
 
 static inline bool is_rd890_iommu(struct pci_dev *pdev)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 74cca1757172..8398a38af719 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -502,10 +502,11 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
 static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
 {
 	struct device *dev = iommu->iommu.dev;
-	int type, devid, pasid, flags, tag;
+	int type, devid, flags, tag;
 	volatile u32 *event = __evt;
 	int count = 0;
 	u64 address;
+	u32 pasid;
 
 retry:
 	type    = (event[1] >> EVENT_TYPE_SHIFT)  & EVENT_TYPE_MASK;
@@ -898,7 +899,7 @@ static void build_inv_iotlb_pages(struct iommu_cmd *cmd, u16 devid, int qdep,
 		cmd->data[2] |= CMD_INV_IOMMU_PAGES_SIZE_MASK;
 }
 
-static void build_inv_iommu_pasid(struct iommu_cmd *cmd, u16 domid, int pasid,
+static void build_inv_iommu_pasid(struct iommu_cmd *cmd, u16 domid, u32 pasid,
 				  u64 address, bool size)
 {
 	memset(cmd, 0, sizeof(*cmd));
@@ -916,7 +917,7 @@ static void build_inv_iommu_pasid(struct iommu_cmd *cmd, u16 domid, int pasid,
 	CMD_SET_TYPE(cmd, CMD_INV_IOMMU_PAGES);
 }
 
-static void build_inv_iotlb_pasid(struct iommu_cmd *cmd, u16 devid, int pasid,
+static void build_inv_iotlb_pasid(struct iommu_cmd *cmd, u16 devid, u32 pasid,
 				  int qdep, u64 address, bool size)
 {
 	memset(cmd, 0, sizeof(*cmd));
@@ -936,7 +937,7 @@ static void build_inv_iotlb_pasid(struct iommu_cmd *cmd, u16 devid, int pasid,
 	CMD_SET_TYPE(cmd, CMD_INV_IOTLB_PAGES);
 }
 
-static void build_complete_ppr(struct iommu_cmd *cmd, u16 devid, int pasid,
+static void build_complete_ppr(struct iommu_cmd *cmd, u16 devid, u32 pasid,
 			       int status, int tag, bool gn)
 {
 	memset(cmd, 0, sizeof(*cmd));
@@ -2772,7 +2773,7 @@ int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
 }
 EXPORT_SYMBOL(amd_iommu_domain_enable_v2);
 
-static int __flush_pasid(struct protection_domain *domain, int pasid,
+static int __flush_pasid(struct protection_domain *domain, u32 pasid,
 			 u64 address, bool size)
 {
 	struct iommu_dev_data *dev_data;
@@ -2833,13 +2834,13 @@ static int __flush_pasid(struct protection_domain *domain, int pasid,
 	return ret;
 }
 
-static int __amd_iommu_flush_page(struct protection_domain *domain, int pasid,
+static int __amd_iommu_flush_page(struct protection_domain *domain, u32 pasid,
 				  u64 address)
 {
 	return __flush_pasid(domain, pasid, address, false);
 }
 
-int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
+int amd_iommu_flush_page(struct iommu_domain *dom, u32 pasid,
 			 u64 address)
 {
 	struct protection_domain *domain = to_pdomain(dom);
@@ -2854,13 +2855,13 @@ int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
 }
 EXPORT_SYMBOL(amd_iommu_flush_page);
 
-static int __amd_iommu_flush_tlb(struct protection_domain *domain, int pasid)
+static int __amd_iommu_flush_tlb(struct protection_domain *domain, u32 pasid)
 {
 	return __flush_pasid(domain, pasid, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
 			     true);
 }
 
-int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid)
+int amd_iommu_flush_tlb(struct iommu_domain *dom, u32 pasid)
 {
 	struct protection_domain *domain = to_pdomain(dom);
 	unsigned long flags;
@@ -2874,7 +2875,7 @@ int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid)
 }
 EXPORT_SYMBOL(amd_iommu_flush_tlb);
 
-static u64 *__get_gcr3_pte(u64 *root, int level, int pasid, bool alloc)
+static u64 *__get_gcr3_pte(u64 *root, int level, u32 pasid, bool alloc)
 {
 	int index;
 	u64 *pte;
@@ -2906,7 +2907,7 @@ static u64 *__get_gcr3_pte(u64 *root, int level, int pasid, bool alloc)
 	return pte;
 }
 
-static int __set_gcr3(struct protection_domain *domain, int pasid,
+static int __set_gcr3(struct protection_domain *domain, u32 pasid,
 		      unsigned long cr3)
 {
 	struct domain_pgtable pgtable;
@@ -2925,7 +2926,7 @@ static int __set_gcr3(struct protection_domain *domain, int pasid,
 	return __amd_iommu_flush_tlb(domain, pasid);
 }
 
-static int __clear_gcr3(struct protection_domain *domain, int pasid)
+static int __clear_gcr3(struct protection_domain *domain, u32 pasid)
 {
 	struct domain_pgtable pgtable;
 	u64 *pte;
@@ -2943,7 +2944,7 @@ static int __clear_gcr3(struct protection_domain *domain, int pasid)
 	return __amd_iommu_flush_tlb(domain, pasid);
 }
 
-int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
+int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, u32 pasid,
 			      unsigned long cr3)
 {
 	struct protection_domain *domain = to_pdomain(dom);
@@ -2958,7 +2959,7 @@ int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
 }
 EXPORT_SYMBOL(amd_iommu_domain_set_gcr3);
 
-int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid)
+int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, u32 pasid)
 {
 	struct protection_domain *domain = to_pdomain(dom);
 	unsigned long flags;
@@ -2972,7 +2973,7 @@ int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid)
 }
 EXPORT_SYMBOL(amd_iommu_domain_clear_gcr3);
 
-int amd_iommu_complete_ppr(struct pci_dev *pdev, int pasid,
+int amd_iommu_complete_ppr(struct pci_dev *pdev, u32 pasid,
 			   int status, int tag)
 {
 	struct iommu_dev_data *dev_data;
diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index e4b025c5637c..a769985ff26e 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -40,7 +40,7 @@ struct pasid_state {
 	struct mmu_notifier mn;                 /* mmu_notifier handle */
 	struct pri_queue pri[PRI_QUEUE_SIZE];	/* PRI tag states */
 	struct device_state *device_state;	/* Link to our device_state */
-	int pasid;				/* PASID index */
+	u32 pasid;				/* PASID index */
 	bool invalid;				/* Used during setup and
 						   teardown of the pasid */
 	spinlock_t lock;			/* Protect pri_queues and
@@ -70,7 +70,7 @@ struct fault {
 	struct mm_struct *mm;
 	u64 address;
 	u16 devid;
-	u16 pasid;
+	u32 pasid;
 	u16 tag;
 	u16 finish;
 	u16 flags;
@@ -150,7 +150,7 @@ static void put_device_state(struct device_state *dev_state)
 
 /* Must be called under dev_state->lock */
 static struct pasid_state **__get_pasid_state_ptr(struct device_state *dev_state,
-						  int pasid, bool alloc)
+						  u32 pasid, bool alloc)
 {
 	struct pasid_state **root, **ptr;
 	int level, index;
@@ -184,7 +184,7 @@ static struct pasid_state **__get_pasid_state_ptr(struct device_state *dev_state
 
 static int set_pasid_state(struct device_state *dev_state,
 			   struct pasid_state *pasid_state,
-			   int pasid)
+			   u32 pasid)
 {
 	struct pasid_state **ptr;
 	unsigned long flags;
@@ -211,7 +211,7 @@ static int set_pasid_state(struct device_state *dev_state,
 	return ret;
 }
 
-static void clear_pasid_state(struct device_state *dev_state, int pasid)
+static void clear_pasid_state(struct device_state *dev_state, u32 pasid)
 {
 	struct pasid_state **ptr;
 	unsigned long flags;
@@ -229,7 +229,7 @@ static void clear_pasid_state(struct device_state *dev_state, int pasid)
 }
 
 static struct pasid_state *get_pasid_state(struct device_state *dev_state,
-					   int pasid)
+					   u32 pasid)
 {
 	struct pasid_state **ptr, *ret = NULL;
 	unsigned long flags;
@@ -594,7 +594,7 @@ static struct notifier_block ppr_nb = {
 	.notifier_call = ppr_notifier,
 };
 
-int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
+int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
 			 struct task_struct *task)
 {
 	struct pasid_state *pasid_state;
@@ -615,7 +615,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
 		return -EINVAL;
 
 	ret = -EINVAL;
-	if (pasid < 0 || pasid >= dev_state->max_pasids)
+	if (pasid >= dev_state->max_pasids)
 		goto out;
 
 	ret = -ENOMEM;
@@ -679,7 +679,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
 }
 EXPORT_SYMBOL(amd_iommu_bind_pasid);
 
-void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid)
+void amd_iommu_unbind_pasid(struct pci_dev *pdev, u32 pasid)
 {
 	struct pasid_state *pasid_state;
 	struct device_state *dev_state;
@@ -695,7 +695,7 @@ void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid)
 	if (dev_state == NULL)
 		return;
 
-	if (pasid < 0 || pasid >= dev_state->max_pasids)
+	if (pasid >= dev_state->max_pasids)
 		goto out;
 
 	pasid_state = get_pasid_state(dev_state, pasid);
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 683b812c5c47..2b75a1561377 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1466,7 +1466,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 }
 
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
-			  u64 granu, int pasid)
+			  u64 granu, u32 pasid)
 {
 	struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
 
@@ -1780,7 +1780,7 @@ void dmar_msi_read(int irq, struct msi_msg *msg)
 }
 
 static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
-		u8 fault_reason, int pasid, u16 source_id,
+		u8 fault_reason, u32 pasid, u16 source_id,
 		unsigned long long addr)
 {
 	const char *reason;
@@ -1830,7 +1830,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
 		u8 fault_reason;
 		u16 source_id;
 		u64 guest_addr;
-		int type, pasid;
+		u32 pasid;
+		int type;
 		u32 data;
 		bool pasid_present;
 
diff --git a/drivers/iommu/intel/intel-pasid.h b/drivers/iommu/intel/intel-pasid.h
index c5318d40e0fa..bb5137183145 100644
--- a/drivers/iommu/intel/intel-pasid.h
+++ b/drivers/iommu/intel/intel-pasid.h
@@ -72,7 +72,7 @@ struct pasid_entry {
 struct pasid_table {
 	void			*table;		/* pasid table pointer */
 	int			order;		/* page order of pasid table */
-	int			max_pasid;	/* max pasid */
+	u32			max_pasid;	/* max pasid */
 	struct list_head	dev;		/* device list */
 };
 
@@ -98,31 +98,31 @@ static inline bool pasid_pte_is_present(struct pasid_entry *pte)
 	return READ_ONCE(pte->val[0]) & PASID_PTE_PRESENT;
 }
 
-extern u32 intel_pasid_max_id;
+extern unsigned int intel_pasid_max_id;
 int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp);
-void intel_pasid_free_id(int pasid);
-void *intel_pasid_lookup_id(int pasid);
+void intel_pasid_free_id(u32 pasid);
+void *intel_pasid_lookup_id(u32 pasid);
 int intel_pasid_alloc_table(struct device *dev);
 void intel_pasid_free_table(struct device *dev);
 struct pasid_table *intel_pasid_get_table(struct device *dev);
 int intel_pasid_get_dev_max_id(struct device *dev);
-struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
+struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid);
 int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 				  struct device *dev, pgd_t *pgd,
-				  int pasid, u16 did, int flags);
+				  u32 pasid, u16 did, int flags);
 int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
-				   struct device *dev, int pasid);
+				   struct device *dev, u32 pasid);
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
-				   struct device *dev, int pasid);
+				   struct device *dev, u32 pasid);
 int intel_pasid_setup_nested(struct intel_iommu *iommu,
-			     struct device *dev, pgd_t *pgd, int pasid,
+			     struct device *dev, pgd_t *pgd, u32 pasid,
 			     struct iommu_gpasid_bind_data_vtd *pasid_data,
 			     struct dmar_domain *domain, int addr_width);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
-				 struct device *dev, int pasid,
+				 struct device *dev, u32 pasid,
 				 bool fault_ignore);
-int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
-void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
+int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid);
+void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid);
 #endif /* __INTEL_PASID_H */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e98..7d2dbc327c62 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2475,7 +2475,7 @@ dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
 static int domain_setup_first_level(struct intel_iommu *iommu,
 				    struct dmar_domain *domain,
 				    struct device *dev,
-				    int pasid)
+				    u32 pasid)
 {
 	int flags = PASID_FLAG_SUPERVISOR_MODE;
 	struct dma_pte *pgd = domain->pgd;
@@ -5155,7 +5155,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
 		return -ENODEV;
 
 	if (domain->default_pasid <= 0) {
-		int pasid;
+		u32 pasid;
 
 		/* No private data needed for the default pasid */
 		pasid = ioasid_alloc(NULL, PASID_MIN,
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c81f0f17c6ba..d8637c020281 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -27,7 +27,7 @@
 static DEFINE_SPINLOCK(pasid_lock);
 u32 intel_pasid_max_id = PASID_MAX;
 
-int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
+int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid)
 {
 	unsigned long flags;
 	u8 status_code;
@@ -58,7 +58,7 @@ int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
 	return ret;
 }
 
-void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
+void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid)
 {
 	unsigned long flags;
 	u8 status_code;
@@ -146,7 +146,7 @@ int intel_pasid_alloc_table(struct device *dev)
 	struct pasid_table *pasid_table;
 	struct pasid_table_opaque data;
 	struct page *pages;
-	int max_pasid = 0;
+	u32 max_pasid = 0;
 	int ret, order;
 	int size;
 
@@ -168,7 +168,7 @@ int intel_pasid_alloc_table(struct device *dev)
 	INIT_LIST_HEAD(&pasid_table->dev);
 
 	if (info->pasid_supported)
-		max_pasid = min_t(int, pci_max_pasids(to_pci_dev(dev)),
+		max_pasid = min_t(u32, pci_max_pasids(to_pci_dev(dev)),
 				  intel_pasid_max_id);
 
 	size = max_pasid >> (PASID_PDE_SHIFT - 3);
@@ -242,7 +242,7 @@ int intel_pasid_get_dev_max_id(struct device *dev)
 	return info->pasid_table->max_pasid;
 }
 
-struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
+struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid)
 {
 	struct device_domain_info *info;
 	struct pasid_table *pasid_table;
@@ -251,8 +251,7 @@ struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
 	int dir_index, index;
 
 	pasid_table = intel_pasid_get_table(dev);
-	if (WARN_ON(!pasid_table || pasid < 0 ||
-		    pasid >= intel_pasid_get_dev_max_id(dev)))
+	if (WARN_ON(!pasid_table || pasid >= intel_pasid_get_dev_max_id(dev)))
 		return NULL;
 
 	dir = pasid_table->table;
@@ -305,7 +304,7 @@ static inline void pasid_clear_entry_with_fpd(struct pasid_entry *pe)
 }
 
 static void
-intel_pasid_clear_entry(struct device *dev, int pasid, bool fault_ignore)
+intel_pasid_clear_entry(struct device *dev, u32 pasid, bool fault_ignore)
 {
 	struct pasid_entry *pe;
 
@@ -444,7 +443,7 @@ pasid_set_eafe(struct pasid_entry *pe)
 
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
-				    u16 did, int pasid)
+				    u16 did, u32 pasid)
 {
 	struct qi_desc desc;
 
@@ -473,7 +472,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, u32 pasid)
 
 static void
 devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
-			       struct device *dev, int pasid)
+			       struct device *dev, u32 pasid)
 {
 	struct device_domain_info *info;
 	u16 sid, qdep, pfsid;
@@ -490,7 +489,7 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 }
 
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
-				 int pasid, bool fault_ignore)
+				 u32 pasid, bool fault_ignore)
 {
 	struct pasid_entry *pte;
 	u16 did;
@@ -515,7 +514,7 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
 
 static void pasid_flush_caches(struct intel_iommu *iommu,
 				struct pasid_entry *pte,
-				int pasid, u16 did)
+			       u32 pasid, u16 did)
 {
 	if (!ecap_coherent(iommu->ecap))
 		clflush_cache_range(pte, sizeof(*pte));
@@ -534,7 +533,7 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
  */
 int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 				  struct device *dev, pgd_t *pgd,
-				  int pasid, u16 did, int flags)
+				  u32 pasid, u16 did, int flags)
 {
 	struct pasid_entry *pte;
 
@@ -607,7 +606,7 @@ static inline int iommu_skip_agaw(struct dmar_domain *domain,
  */
 int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
-				   struct device *dev, int pasid)
+				   struct device *dev, u32 pasid)
 {
 	struct pasid_entry *pte;
 	struct dma_pte *pgd;
@@ -665,7 +664,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
  */
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
-				   struct device *dev, int pasid)
+				   struct device *dev, u32 pasid)
 {
 	u16 did = FLPT_DEFAULT_DID;
 	struct pasid_entry *pte;
@@ -751,7 +750,7 @@ intel_pasid_setup_bind_data(struct intel_iommu *iommu, struct pasid_entry *pte,
  * @addr_width: Address width of the first level (guest)
  */
 int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
-			     pgd_t *gpgd, int pasid,
+			     pgd_t *gpgd, u32 pasid,
 			     struct iommu_gpasid_bind_data_vtd *pasid_data,
 			     struct dmar_domain *domain, int addr_width)
 {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 6c87c807a0ab..778089d198eb 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -23,7 +23,7 @@
 #include "intel-pasid.h"
 
 static irqreturn_t prq_event_thread(int irq, void *d);
-static void intel_svm_drain_prq(struct device *dev, int pasid);
+static void intel_svm_drain_prq(struct device *dev, u32 pasid);
 
 #define PRQ_ORDER 0
 
@@ -371,7 +371,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 	return ret;
 }
 
-int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
 	struct intel_svm_dev *sdev;
@@ -601,7 +601,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
 }
 
 /* Caller must hold pasid_mutex */
-static int intel_svm_unbind_mm(struct device *dev, int pasid)
+static int intel_svm_unbind_mm(struct device *dev, u32 pasid)
 {
 	struct intel_svm_dev *sdev;
 	struct intel_iommu *iommu;
@@ -728,7 +728,7 @@ static bool is_canonical_address(u64 addr)
  * described in VT-d spec CH7.10 to drain all page requests and page
  * responses pending in the hardware.
  */
-static void intel_svm_drain_prq(struct device *dev, int pasid)
+static void intel_svm_drain_prq(struct device *dev, u32 pasid)
 {
 	struct device_domain_info *info;
 	struct dmar_domain *domain;
@@ -988,10 +988,10 @@ void intel_svm_unbind(struct iommu_sva *sva)
 	mutex_unlock(&pasid_mutex);
 }
 
-int intel_svm_get_pasid(struct iommu_sva *sva)
+u32 intel_svm_get_pasid(struct iommu_sva *sva)
 {
 	struct intel_svm_dev *sdev;
-	int pasid;
+	u32 pasid;
 
 	mutex_lock(&pasid_mutex);
 	sdev = to_intel_svm_dev(sva);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d43120eb1dc5..56dbabdd7f5f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2828,7 +2828,7 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
 }
 EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
 
-int iommu_sva_get_pasid(struct iommu_sva *handle)
+u32 iommu_sva_get_pasid(struct iommu_sva *handle)
 {
 	const struct iommu_ops *ops = handle->dev->bus->iommu_ops;
 
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 107028e77ca3..5cc6ebccba65 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -92,7 +92,7 @@ static long uacce_fops_compat_ioctl(struct file *filep,
 
 static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
 {
-	int pasid;
+	u32 pasid;
 	struct iommu_sva *handle;
 
 	if (!(uacce->flags & UACCE_DEV_SVA))
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 21e950e4ab62..450717299928 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -76,7 +76,7 @@ extern void amd_iommu_free_device(struct pci_dev *pdev);
  *
  * The function returns 0 on success or a negative value on error.
  */
-extern int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
+extern int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
 				struct task_struct *task);
 
 /**
@@ -88,7 +88,7 @@ extern int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
  * When this function returns the device is no longer using the PASID
  * and the PASID is no longer bound to its task.
  */
-extern void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid);
+extern void amd_iommu_unbind_pasid(struct pci_dev *pdev, u32 pasid);
 
 /**
  * amd_iommu_set_invalid_ppr_cb() - Register a call-back for failed
@@ -114,7 +114,7 @@ extern void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid);
 #define AMD_IOMMU_INV_PRI_RSP_FAIL	2
 
 typedef int (*amd_iommu_invalid_ppr_cb)(struct pci_dev *pdev,
-					int pasid,
+					u32 pasid,
 					unsigned long address,
 					u16);
 
@@ -166,7 +166,7 @@ extern int amd_iommu_device_info(struct pci_dev *pdev,
  * @cb: The call-back function
  */
 
-typedef void (*amd_iommu_invalidate_ctx)(struct pci_dev *pdev, int pasid);
+typedef void (*amd_iommu_invalidate_ctx)(struct pci_dev *pdev, u32 pasid);
 
 extern int amd_iommu_set_invalidate_ctx_cb(struct pci_dev *pdev,
 					   amd_iommu_invalidate_ctx cb);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 3e8fa1c7a1e6..643951e28dd4 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -550,7 +550,7 @@ struct dmar_domain {
 					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
 	u64		max_addr;	/* maximum mapped address */
 
-	int		default_pasid;	/*
+	u32		default_pasid;	/*
 					 * The default pasid used for non-SVM
 					 * traffic on mediated devices.
 					 */
@@ -707,7 +707,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 			      u32 pasid, u16 qdep, u64 addr,
 			      unsigned int size_order, u64 granu);
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
-			  int pasid);
+			  u32 pasid);
 
 int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
 		   unsigned int count, unsigned long options);
@@ -735,11 +735,11 @@ extern int intel_svm_enable_prq(struct intel_iommu *iommu);
 extern int intel_svm_finish_prq(struct intel_iommu *iommu);
 int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
 			  struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, int pasid);
+int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
 struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
 				 void *drvdata);
 void intel_svm_unbind(struct iommu_sva *handle);
-int intel_svm_get_pasid(struct iommu_sva *handle);
+u32 intel_svm_get_pasid(struct iommu_sva *handle);
 struct svm_dev_ops;
 
 struct intel_svm_dev {
@@ -748,7 +748,7 @@ struct intel_svm_dev {
 	struct device *dev;
 	struct svm_dev_ops *ops;
 	struct iommu_sva sva;
-	int pasid;
+	u32 pasid;
 	int users;
 	u16 did;
 	u16 dev_iotlb:1;
@@ -761,7 +761,7 @@ struct intel_svm {
 
 	struct intel_iommu *iommu;
 	int flags;
-	int pasid;
+	u32 pasid;
 	int gpasid; /* In case that guest PASID is different from host PASID */
 	struct list_head devs;
 	struct list_head list;
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index c9e7e601950d..39d368a810b8 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -11,7 +11,7 @@
 struct device;
 
 struct svm_dev_ops {
-	void (*fault_cb)(struct device *dev, int pasid, u64 address,
+	void (*fault_cb)(struct device *dev, u32 pasid, u64 address,
 			 void *private, int rwxp, int response);
 };
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5f0b7859d2eb..9188ec5b39c5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -292,7 +292,7 @@ struct iommu_ops {
 	struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
 				      void *drvdata);
 	void (*sva_unbind)(struct iommu_sva *handle);
-	int (*sva_get_pasid)(struct iommu_sva *handle);
+	u32 (*sva_get_pasid)(struct iommu_sva *handle);
 
 	int (*page_response)(struct device *dev,
 			     struct iommu_fault_event *evt,
@@ -302,7 +302,7 @@ struct iommu_ops {
 	int (*sva_bind_gpasid)(struct iommu_domain *domain,
 			struct device *dev, struct iommu_gpasid_bind_data *data);
 
-	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
+	int (*sva_unbind_gpasid)(struct device *dev, u32 pasid);
 
 	int (*def_domain_type)(struct device *dev);
 
@@ -656,7 +656,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
 					struct mm_struct *mm,
 					void *drvdata);
 void iommu_sva_unbind_device(struct iommu_sva *handle);
-int iommu_sva_get_pasid(struct iommu_sva *handle);
+u32 iommu_sva_get_pasid(struct iommu_sva *handle);
 
 #else /* CONFIG_IOMMU_API */
 
@@ -1049,7 +1049,7 @@ static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
 {
 }
 
-static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
+static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
 {
 	return IOMMU_PASID_INVALID;
 }
@@ -1068,7 +1068,7 @@ static inline int iommu_sva_bind_gpasid(struct iommu_domain *domain,
 }
 
 static inline int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-					   struct device *dev, int pasid)
+					   struct device *dev, u32 pasid)
 {
 	return -ENODEV;
 }
diff --git a/include/linux/uacce.h b/include/linux/uacce.h
index 454c2f6672d7..48e319f40275 100644
--- a/include/linux/uacce.h
+++ b/include/linux/uacce.h
@@ -81,7 +81,7 @@ struct uacce_queue {
 	struct list_head list;
 	struct uacce_qfile_region *qfrs[UACCE_MAX_REGION];
 	enum uacce_q_state state;
-	int pasid;
+	u32 pasid;
 	struct iommu_sva *handle;
 };
 
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 02/12] iommu/vt-d: Change flags type to unsigned int in binding mm
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
  2020-07-13 23:47 ` [PATCH v6 01/12] iommu: Change type of pasid to u32 Fenghua Yu
@ 2020-07-13 23:47 ` Fenghua Yu
  2020-07-13 23:47 ` [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing) Fenghua Yu
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:47 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".

Change its type to "unsigned int".

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
---
v5:
- Reviewed by Lu Baolu

v2:
- Add this new patch per Thomas' comment.

 drivers/iommu/intel/svm.c   | 7 ++++---
 include/linux/intel-iommu.h | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 778089d198eb..8a0cf2f0dd54 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -427,7 +427,8 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
 
 /* Caller must hold pasid_mutex, mm reference */
 static int
-intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
+intel_svm_bind_mm(struct device *dev, unsigned int flags,
+		  struct svm_dev_ops *ops,
 		  struct mm_struct *mm, struct intel_svm_dev **sd)
 {
 	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -954,7 +955,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
 {
 	struct iommu_sva *sva = ERR_PTR(-EINVAL);
 	struct intel_svm_dev *sdev = NULL;
-	int flags = 0;
+	unsigned int flags = 0;
 	int ret;
 
 	/*
@@ -963,7 +964,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
 	 * and intel_svm etc.
 	 */
 	if (drvdata)
-		flags = *(int *)drvdata;
+		flags = *(unsigned int *)drvdata;
 	mutex_lock(&pasid_mutex);
 	ret = intel_svm_bind_mm(dev, flags, NULL, mm, &sdev);
 	if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 643951e28dd4..d129baf7e0b8 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -760,7 +760,7 @@ struct intel_svm {
 	struct mm_struct *mm;
 
 	struct intel_iommu *iommu;
-	int flags;
+	unsigned int flags;
 	u32 pasid;
 	int gpasid; /* In case that guest PASID is different from host PASID */
 	struct list_head devs;
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
  2020-07-13 23:47 ` [PATCH v6 01/12] iommu: Change type of pasid to u32 Fenghua Yu
  2020-07-13 23:47 ` [PATCH v6 02/12] iommu/vt-d: Change flags type to unsigned int in binding mm Fenghua Yu
@ 2020-07-13 23:47 ` Fenghua Yu
  2020-07-14  3:25   ` Liu, Yi L
  2020-07-13 23:47 ` [PATCH v6 04/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions Fenghua Yu
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:47 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

From: Ashok Raj <ashok.raj@intel.com>

ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
features are a complicated stack with lots of interconnected pieces.
This documentation provides a big picture overview for all of the
features.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
v3:
- Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu)
- Fix a couple of typos (Baolu)

v2:
- Fix the doc format and add the doc in toctree (Thomas)
- Modify the doc for better description (Thomas, Tony, Dave)

 Documentation/x86/index.rst |   1 +
 Documentation/x86/sva.rst   | 287 ++++++++++++++++++++++++++++++++++++
 2 files changed, 288 insertions(+)
 create mode 100644 Documentation/x86/sva.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..e5d5ff096685 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -30,3 +30,4 @@ x86-specific Documentation
    usb-legacy-support
    i386/index
    x86_64/index
+   sva
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index 000000000000..7242a84169ef
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,287 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+Shared Virtual Addressing (SVA) with ENQCMD
+===========================================
+
+Background
+==========
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM)
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU also is required
+to support PCIe features ATS and PRI. ATS allows devices to cache
+translations for the virtual address. IOMMU driver uses the mmu_notifier()
+support to keep the device tlb cache and the CPU cache in sync. PRI allows
+the device to request paging the virtual address before using if they are
+not paged in the CPU page tables.
+
+
+Shared Hardware Workqueues
+==========================
+
+Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+======
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement fairness or ensure forward progress can be made.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permit hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=============================
+
+A new thread scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA capable device this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU specific api
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+iommu_sva_bind_device(), which will do the following.
+
+- Allocate the PASID, and program the process page-table (cr3) in the PASID
+  context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+  the device tlb in sync. For example, when a page-table entry is invalidated,
+  IOMMU propagates the invalidation to device tlb. This will force any
+  future access by the device to this virtual address to participate in
+  ATS. If the IOMMU responds with proper response that a page is not
+  present, the device would request the page to be paged in via the PCIe PRI
+  protocol before performing I/O.
+
+This MSR is managed with the XSAVE feature set as "supervisor state" to
+ensure the MSR is updated during context switch.
+
+PASID Management
+================
+
+The kernel must allocate a PASID on behalf of each process and program it
+into the new MSR to communicate the process identity to platform hardware.
+ENQCMD uses the PASID stored in this MSR to tag requests from this process.
+When a user submits a work descriptor to a device using the ENQCMD
+instruction, the PASID field in the descriptor is auto-filled with the
+value from MSR_IA32_PASID. Requests for DMA from the device are also tagged
+with the same PASID. The platform IOMMU uses the PASID in the transaction to
+perform address translation. The IOMMU api's setup the corresponding PASID
+entry in IOMMU with the process address used by the CPU (for e.g cr3 in x86).
+
+The MSR must be configured on each logical CPU before any application
+thread can interact with a device. Threads that belong to the same
+process share the same page tables, thus the same MSR value.
+
+PASID is cleared when a process is created. The PASID allocation and MSR
+programming may occur long after a process and its threads have been created.
+One thread must call bind() to allocate the PASID for the process. If a
+thread uses ENQCMD without the MSR first being populated, it will cause #GP.
+The kernel will fix up the #GP by writing the process-wide PASID into the
+thread that took the #GP. A single process PASID can be used simultaneously
+with multiple devices since they all share the same address space.
+
+New threads could inherit the MSR value from the parent. But this would
+involve additional state management for those threads which may never use
+ENQCMD. Clearing the MSR at thread creation permits all threads to have a
+consistent behavior; the PASID is only programmed when the thread calls
+the bind() api (iommu_sva_bind_device()()), or when a thread calls ENQCMD for
+the first time.
+
+PASID Lifecycle Management
+==========================
+
+Only processes that access SVA capable devices need to have a PASID
+allocated. This allocation happens when a process first opens an SVA
+capable device (subsequent opens of the same, or other devices will
+share the same PASID).
+
+Although the PASID is allocated to the process by opening a device,
+it is not active in any of the threads of that process. Activation is
+done lazily when a thread tries to submit a work descriptor to a device
+using the ENQCMD.
+
+That first access will trigger a #GP fault because the IA32_PASID MSR
+has not been initialized with the PASID value assigned to the process
+when the device was opened. The Linux #GP handler notes that a PASID as
+been allocated for the process, and so initializes the IA32_PASID MSR
+and returns so that the ENQCMD instruction is re-executed.
+
+On fork(2) or exec(2) the PASID is removed from the process as it no
+longer has the same address space that it had when the device was opened.
+
+On clone(2) the new task shares the same address space, so will be
+able to use the PASID allocated to the process. The IA32_PASID is not
+preemptively initialized as the kernel does not know whether this thread
+is going to access the device.
+
+On exit(2) the PASID is freed. The device driver ensures that any pending
+operations queued to the device are either completed or aborted before
+allowing the PASID to be reallocated.
+
+Relationships
+=============
+
+ * Each process has many threads, but only one PASID
+ * Devices have a limited number (~10's to 1000's) of hardware
+   workqueues and each portal maps down to a single workqueue.
+   The device driver manages allocating hardware workqueues.
+ * A single mmap() maps a single hardware workqueue as a "portal"
+ * For each device with which a process interacts, there must be
+   one or more mmap()'d portals.
+ * Many threads within a process can share a single portal to access
+   a single device.
+ * Multiple processes can separately mmap() the same portal, in
+   which case they still share one device hardware workqueue.
+ * The single process-wide PASID is used by all threads to interact
+   with all devices.  There is not, for instance, a PASID for each
+   thread or each thread<->device pair.
+
+FAQ
+===
+
+* What is SVA/SVM?
+
+Shared Virtual Addressing (SVA) permits I/O hardware and the processor to
+work in the same address space. In short, sharing the address space. Some
+call it Shared Virtual Memory (SVM), but Linux community wanted to avoid
+it with Posix Shared Memory and Secure Virtual Machines which were terms
+already in circulation.
+
+* What is a PASID?
+
+A Process Address Space ID (PASID) is a PCIe-defined TLP Prefix. A PASID is
+a 20 bit number allocated and managed by the OS. PASID is included in all
+transactions between the platform and the device.
+
+* How are shared work queues different?
+
+Traditionally to allow user space applications interact with hardware,
+there is a separate instance required per process. For example, consider
+doorbells as a mechanism of informing hardware about work to process. Each
+doorbell is required to be spaced 4k (or page-size) apart for process
+isolation. This requires hardware to provision that space and reserve in
+MMIO. This doesn't scale as the number of threads becomes quite large. The
+hardware also manages the queue depth for Shared Work Queues (SWQ), and
+consumers don't need to track queue depth. If there is no space to accept
+a command, the device will return an error indicating retry. Also
+submitting a command to an MMIO address that can't accept ENQCMD will
+return retry in response. In the new DMWr PCIe terminology, devices need to
+support DMWr completer capability. In addition it requires all switch ports
+to support DMWr routing and must be enabled by the PCIe subsystem, much
+like how PCIe Atomics() are managed for instance.
+
+SWQ allows hardware to provision just a single address in the device. When
+used with ENQCMD to submit work, the device can distinguish the process
+submitting the work since it will include the PASID assigned to that
+process. This decreases the pressure of hardware requiring to support
+hardware to scale to a large number of processes.
+
+* Is this the same as a user space device driver?
+
+Communicating with the device via the shared work queue is much simpler
+than a full blown user space driver. The kernel driver does all the
+initialization of the hardware. User space only needs to worry about
+submitting work and processing completions.
+
+* Is this the same as SR-IOV?
+
+Single Root I/O Virtualization (SR-IOV) focuses on providing independent
+hardware interfaces for virtualizing hardware. Hence its required to be
+almost fully functional interface to software supporting the traditional
+BAR's, space for interrupts via MSI-x, its own register layout.
+Virtual Functions (VFs) are assisted by the Physical Function (PF)
+driver.
+
+Scalable I/O Virtualization builds on the PASID concept to create device
+instances for virtualization. SIOV requires host software to assist in
+creating virtual devices, each virtual device is represented by a PASID
+along with the BDF of the device.  This allows device hardware to optimize
+device resource creation and can grow dynamically on demand. SR-IOV creation
+and management is very static in nature. Consult references below for more
+details.
+
+* Why not just create a virtual function for each app?
+
+Creating PCIe SRIOV type virtual functions (VF) are expensive. They create
+duplicated hardware for PCI config space requirements, Interrupts such as
+MSIx for instance. Resources such as interrupts have to be hard partitioned
+between VF's at creation time, and cannot scale dynamically on demand. The
+VF's are not completely independent from the Physical function (PF). Most
+VF's require some communication and assistance from the PF driver. SIOV
+creates a software defined device. Where all the configuration and control
+aspects are mediated via the slow path. The work submission and completion
+happen without any mediation.
+
+* Does this support virtualization?
+
+ENQCMD can be used from within a guest VM. In these cases the VMM helps
+with setting up a translation table to translate from Guest PASID to Host
+PASID. Please consult the ENQCMD instruction set reference for more
+details.
+
+* Does memory need to be pinned?
+
+When devices support SVA, along with platform hardware such as IOMMU
+supporting such devices, there is no need to pin memory for DMA purposes.
+Devices that support SVA also support other PCIe features that remove the
+pinning requirement for memory.
+
+Device TLB support - Device requests the IOMMU to lookup an address before
+use via Address Translation Service (ATS) requests.  If the mapping exists
+but there is no page allocated by the OS, IOMMU hardware returns that no
+mapping exists.
+
+Device requests that virtual address to be mapped via Page Request
+Interface (PRI). Once the OS has successfully completed  the mapping, it
+returns the response back to the device. The device continues again to
+request for a translation and continues.
+
+IOMMU works with the OS in managing consistency of page-tables with the
+device. When removing pages, it interacts with the device to remove any
+device-tlb that might have been cached before removing the mappings from
+the OS.
+
+References
+==========
+
+VT-D:
+https://01.org/blogs/ashokraj/2018/recent-enhancements-intel-virtualization-technology-directed-i/o-intel-vt-d
+
+SIOV:
+https://01.org/blogs/2019/assignable-interfaces-intel-scalable-i/o-virtualization-linux
+
+ENQCMD in ISE:
+https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
+
+DSA spec:
+https://software.intel.com/sites/default/files/341204-intel-data-streaming-accelerator-spec.pdf
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 04/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
                   ` (2 preceding siblings ...)
  2020-07-13 23:47 ` [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing) Fenghua Yu
@ 2020-07-13 23:47 ` Fenghua Yu
  2020-07-13 23:48 ` [PATCH v6 05/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature Fenghua Yu
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:47 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

Work submission instruction comes in two flavors. ENQCMD can be called
both in ring 3 and ring 0 and always uses the contents of PASID MSR when
shipping the command to the device. ENQCMDS allows a kernel driver to
submit commands on behalf of a user process. The driver supplies the
PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel
as of now.

The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
v2:
- Re-write commit message (Thomas)

 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 02dabc9e77b0..4469618c410f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -351,6 +351,7 @@
 #define X86_FEATURE_CLDEMOTE		(16*32+25) /* CLDEMOTE instruction */
 #define X86_FEATURE_MOVDIRI		(16*32+27) /* MOVDIRI instruction */
 #define X86_FEATURE_MOVDIR64B		(16*32+28) /* MOVDIR64B instruction */
+#define X86_FEATURE_ENQCMD		(16*32+29) /* ENQCMD and ENQCMDS instructions */
 
 /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */
 #define X86_FEATURE_OVERFLOW_RECOV	(17*32+ 0) /* MCA overflow recovery support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3cbe24ca80ab..3a02707c1f4d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
 	{ X86_FEATURE_CQM_MBM_TOTAL,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_CQM_MBM_LOCAL,		X86_FEATURE_CQM_LLC   },
 	{ X86_FEATURE_AVX512_BF16,		X86_FEATURE_AVX512VL  },
+	{ X86_FEATURE_ENQCMD,			X86_FEATURE_XSAVES    },
 	{}
 };
 
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 05/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
                   ` (3 preceding siblings ...)
  2020-07-13 23:47 ` [PATCH v6 04/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions Fenghua Yu
@ 2020-07-13 23:48 ` Fenghua Yu
  2020-07-13 23:48 ` [PATCH v6 06/12] x86/msr-index: Define IA32_PASID MSR Fenghua Yu
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:48 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, Yu-cheng Yu, x86, linux-kernel, amd-gfx, iommu

From: Yu-cheng Yu <yu-cheng.yu@intel.com>

ENQCMD instruction reads PASID from IA32_PASID MSR. The MSR is stored
in the task's supervisor FPU PASID state and is context switched by
XSAVES/XRSTORS.

Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
v2:
- Modify the commit message (Thomas)

 arch/x86/include/asm/fpu/types.h  | 10 ++++++++++
 arch/x86/include/asm/fpu/xstate.h |  2 +-
 arch/x86/kernel/fpu/xstate.c      |  4 ++++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f098f6cab94b..00f8efd4c07d 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -114,6 +114,7 @@ enum xfeature {
 	XFEATURE_Hi16_ZMM,
 	XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
 	XFEATURE_PKRU,
+	XFEATURE_PASID,
 
 	XFEATURE_MAX,
 };
@@ -128,6 +129,7 @@ enum xfeature {
 #define XFEATURE_MASK_Hi16_ZMM		(1 << XFEATURE_Hi16_ZMM)
 #define XFEATURE_MASK_PT		(1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
 #define XFEATURE_MASK_PKRU		(1 << XFEATURE_PKRU)
+#define XFEATURE_MASK_PASID		(1 << XFEATURE_PASID)
 
 #define XFEATURE_MASK_FPSSE		(XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
 #define XFEATURE_MASK_AVX512		(XFEATURE_MASK_OPMASK \
@@ -229,6 +231,14 @@ struct pkru_state {
 	u32				pad;
 } __packed;
 
+/*
+ * State component 10 is supervisor state used for context-switching the
+ * PASID state.
+ */
+struct ia32_pasid_state {
+	u64 pasid;
+} __packed;
+
 struct xstate_header {
 	u64				xfeatures;
 	u64				xcomp_bv;
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 422d8369012a..ab9833c57aaa 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -33,7 +33,7 @@
 				      XFEATURE_MASK_BNDCSR)
 
 /* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (0)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)
 
 /*
  * Unsupported supervisor features. When a supervisor feature in this mask is
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index bda2e5eaca0e..31629e43383c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -37,6 +37,7 @@ static const char *xfeature_names[] =
 	"AVX-512 ZMM_Hi256"		,
 	"Processor Trace (unused)"	,
 	"Protection Keys User registers",
+	"PASID state",
 	"unknown xstate feature"	,
 };
 
@@ -51,6 +52,7 @@ static short xsave_cpuid_features[] __initdata = {
 	X86_FEATURE_AVX512F,
 	X86_FEATURE_INTEL_PT,
 	X86_FEATURE_PKU,
+	X86_FEATURE_ENQCMD,
 };
 
 /*
@@ -316,6 +318,7 @@ static void __init print_xstate_features(void)
 	print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
 	print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
 	print_xstate_feature(XFEATURE_MASK_PKRU);
+	print_xstate_feature(XFEATURE_MASK_PASID);
 }
 
 /*
@@ -590,6 +593,7 @@ static void check_xstate_against_struct(int nr)
 	XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
 	XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM,  struct avx_512_hi16_state);
 	XCHECK_SZ(sz, nr, XFEATURE_PKRU,      struct pkru_state);
+	XCHECK_SZ(sz, nr, XFEATURE_PASID,     struct ia32_pasid_state);
 
 	/*
 	 * Make *SURE* to add any feature numbers in below if
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 06/12] x86/msr-index: Define IA32_PASID MSR
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
                   ` (4 preceding siblings ...)
  2020-07-13 23:48 ` [PATCH v6 05/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature Fenghua Yu
@ 2020-07-13 23:48 ` Fenghua Yu
  2020-07-13 23:48 ` [PATCH v6 07/12] mm: Define pasid in mm Fenghua Yu
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:48 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses PASID to identify process
address space and direct responses to the right address space.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
v2:
- Change "identify process" to "identify process address space" in the
  commit message (Thomas)

 arch/x86/include/asm/msr-index.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e8370e64a155..e5f699ff1dd6 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -237,6 +237,9 @@
 #define MSR_IA32_LASTINTFROMIP		0x000001dd
 #define MSR_IA32_LASTINTTOIP		0x000001de
 
+#define MSR_IA32_PASID			0x00000d93
+#define MSR_IA32_PASID_VALID		BIT_ULL(31)
+
 /* DEBUGCTLMSR bits (others vary by model): */
 #define DEBUGCTLMSR_LBR			(1UL <<  0) /* last branch recording */
 #define DEBUGCTLMSR_BTF_SHIFT		1
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 07/12] mm: Define pasid in mm
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
                   ` (5 preceding siblings ...)
  2020-07-13 23:48 ` [PATCH v6 06/12] x86/msr-index: Define IA32_PASID MSR Fenghua Yu
@ 2020-07-13 23:48 ` Fenghua Yu
  2020-07-13 23:48 ` [PATCH v6 08/12] fork: Clear PASID for new mm Fenghua Yu
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:48 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

PASID is shared by all threads in a process. So the logical place to keep
track of it is in the "mm". Both ARM and X86 need to use the PASID in the
"mm".

Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
v4:
- Change PASID type to u32 (Christoph)

v3:
- Change CONFIG_PCI_PASID to CONFIG_IOMMU_SUPPORT because non-PCI device
  can have PASID in ARM (Jean)

v2:
- This new patch moves "pasid" from x86 specific mm_context_t to generic
  struct mm_struct per Christopher's comment: https://lore.kernel.org/linux-iommu/20200414170252.714402-1-jean-philippe@linaro.org/T/#mb57110ffe1aaa24750eeea4f93b611f0d1913911
- Jean-Philippe Brucker released a virtually same patch. I still put this
  patch in the series for better review. The upstream kernel only needs one
  of the two patches eventually.
https://lore.kernel.org/linux-iommu/20200519175502.2504091-2-jean-philippe@linaro.org/
- Change CONFIG_IOASID to CONFIG_PCI_PASID (Ashok)

 include/linux/mm_types.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 64ede5f150dc..d61285cfe027 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -538,6 +538,10 @@ struct mm_struct {
 		atomic_long_t hugetlb_usage;
 #endif
 		struct work_struct async_put_work;
+
+#ifdef CONFIG_IOMMU_SUPPORT
+		u32 pasid;
+#endif
 	} __randomize_layout;
 
 	/*
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 08/12] fork: Clear PASID for new mm
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
                   ` (6 preceding siblings ...)
  2020-07-13 23:48 ` [PATCH v6 07/12] mm: Define pasid in mm Fenghua Yu
@ 2020-07-13 23:48 ` Fenghua Yu
  2021-02-24 10:19   ` Jean-Philippe Brucker
  2020-07-13 23:48 ` [PATCH v6 09/12] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:48 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

When a new mm is created, its PASID should be cleared, i.e. the PASID is
initialized to its init state 0 on both ARM and X86.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
v2:
- Add this patch to initialize PASID value for a new mm.

 include/linux/mm_types.h | 2 ++
 kernel/fork.c            | 8 ++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index d61285cfe027..d60d2ec10881 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -22,6 +22,8 @@
 #endif
 #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
 
+/* Initial PASID value is 0. */
+#define INIT_PASID	0
 
 struct address_space;
 struct mem_cgroup;
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..43b5f112604d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1007,6 +1007,13 @@ static void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
 #endif
 }
 
+static void mm_init_pasid(struct mm_struct *mm)
+{
+#ifdef CONFIG_IOMMU_SUPPORT
+	mm->pasid = INIT_PASID;
+#endif
+}
+
 static void mm_init_uprobes_state(struct mm_struct *mm)
 {
 #ifdef CONFIG_UPROBES
@@ -1035,6 +1042,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
 	mm_init_cpumask(mm);
 	mm_init_aio(mm);
 	mm_init_owner(mm, p);
+	mm_init_pasid(mm);
 	RCU_INIT_POINTER(mm->exe_file, NULL);
 	mmu_notifier_subscriptions_init(mm);
 	init_tlb_flush_pending(mm);
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 09/12] x86/process: Clear PASID state for a newly forked/cloned thread
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
                   ` (7 preceding siblings ...)
  2020-07-13 23:48 ` [PATCH v6 08/12] fork: Clear PASID for new mm Fenghua Yu
@ 2020-07-13 23:48 ` Fenghua Yu
  2020-08-01  1:44   ` Andy Lutomirski
  2020-07-13 23:48 ` [PATCH v6 10/12] x86/mmu: Allocate/free PASID Fenghua Yu
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:48 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

The PASID state has to be cleared on forks, since the child has a
different address space. The PASID is also cleared for thread clone. While
it would be correct to inherit the PASID in this case, it is unknown
whether the new task will use ENQCMD. Giving it the PASID "just in case"
would have the downside of increased context switch overhead to setting
the PASID MSR.

Since #GP faults have to be handled on any threads that were created before
the PASID was assigned to the mm of the process, newly created threads
might as well be treated in a consistent way.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
---
v2:
- Modify init_task_pasid().

 arch/x86/kernel/process.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index f362ce0d5ac0..1b1492e337a6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -121,6 +121,21 @@ static int set_new_tls(struct task_struct *p, unsigned long tls)
 		return do_set_thread_area_64(p, ARCH_SET_FS, tls);
 }
 
+/* Initialize the PASID state for the forked/cloned thread. */
+static void init_task_pasid(struct task_struct *task)
+{
+	struct ia32_pasid_state *ppasid;
+
+	/*
+	 * Initialize the PASID state so that the PASID MSR will be
+	 * initialized to its initial state (0) by XRSTORS when the task is
+	 * scheduled for the first time.
+	 */
+	ppasid = get_xsave_addr(&task->thread.fpu.state.xsave, XFEATURE_PASID);
+	if (ppasid)
+		ppasid->pasid = INIT_PASID;
+}
+
 int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 		    unsigned long arg, struct task_struct *p, unsigned long tls)
 {
@@ -174,6 +189,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
 	task_user_gs(p) = get_user_gs(current_pt_regs());
 #endif
 
+	if (static_cpu_has(X86_FEATURE_ENQCMD))
+		init_task_pasid(p);
+
 	/* Set a new TLS for the child thread? */
 	if (clone_flags & CLONE_SETTLS)
 		ret = set_new_tls(p, tls);
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 10/12] x86/mmu: Allocate/free PASID
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
                   ` (8 preceding siblings ...)
  2020-07-13 23:48 ` [PATCH v6 09/12] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
@ 2020-07-13 23:48 ` Fenghua Yu
  2020-07-13 23:48 ` [PATCH v6 11/12] sched: Define and initialize a flag to identify valid PASID in the task Fenghua Yu
  2020-07-13 23:48 ` [PATCH v6 12/12] x86/traps: Fix up invalid PASID Fenghua Yu
  11 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:48 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

A PASID is allocated for an "mm" the first time any thread attaches
to an SVM capable device. Later device attachments (whether to the same
device or another SVM device) will re-use the same PASID.

The PASID is freed when the process exits (so no need to keep
reference counts on how many SVM devices are sharing the PASID).

Currently the ENQCMD feature cannot be used if CONFIG_INTEL_IOMMU_SVM
is not set. Add X86_FEATURE_ENQCMD to the disabled features mask as
appropriate and use cpu_feature_enabled() to check the feature.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
---
v5:
- Mark ENQCMD disabled when configured out and remove redundant
  CONFIG_INTEL_IOMMU_SVM check which is included in cpu_feature_enabled()
  in fixup_pasid_exception() (PeterZ and Dave Hansen)
- Reviewed by Lu Baolu

v4:
- Change PASID type to u32 (Christoph)

v3:
- Add sanity checks in alloc_pasid() and _free_pasid() (Baolu)
- Add a comment that the private PASID feature will be removed completely
  from IOMMU and don't track private PASID in mm (Thomas)

v2:
- Define a helper free_bind() to simplify error exit code in bind_mm()
  (Thomas)
- Fix a ret error code in bind_mm() (Thomas)
- Change pasid's type from "int" to "unsigned int" to have consistent
  pasid type in iommu (Thomas)
- Simplify alloc_pasid() a bit.

 arch/x86/include/asm/disabled-features.h |   9 +-
 arch/x86/include/asm/iommu.h             |   2 +
 arch/x86/include/asm/mmu_context.h       |  11 ++
 drivers/iommu/intel/svm.c                | 128 ++++++++++++++++++++---
 4 files changed, 137 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h
index 4ea8584682f9..588d83e9da49 100644
--- a/arch/x86/include/asm/disabled-features.h
+++ b/arch/x86/include/asm/disabled-features.h
@@ -56,6 +56,12 @@
 # define DISABLE_PTI		(1 << (X86_FEATURE_PTI & 31))
 #endif
 
+#ifdef CONFIG_INTEL_IOMMU_SVM
+# define DISABLE_ENQCMD	0
+#else
+# define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31))
+#endif
+
 /*
  * Make sure to add features to the correct mask
  */
@@ -75,7 +81,8 @@
 #define DISABLED_MASK13	0
 #define DISABLED_MASK14	0
 #define DISABLED_MASK15	0
-#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP)
+#define DISABLED_MASK16	(DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \
+			 DISABLE_ENQCMD)
 #define DISABLED_MASK17	0
 #define DISABLED_MASK18	0
 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 19)
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index bf1ed2ddc74b..ed41259fe7ac 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
 	return -EINVAL;
 }
 
+void __free_pasid(struct mm_struct *mm);
+
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 47562147e70b..e1e7f1df6829 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@
 #include <asm/tlbflush.h>
 #include <asm/paravirt.h>
 #include <asm/debugreg.h>
+#include <asm/iommu.h>
 
 extern atomic64_t last_mm_ctx_id;
 
@@ -117,9 +118,19 @@ static inline int init_new_context(struct task_struct *tsk,
 	init_new_context_ldt(mm);
 	return 0;
 }
+
+static inline void free_pasid(struct mm_struct *mm)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+		return;
+
+	__free_pasid(mm);
+}
+
 static inline void destroy_context(struct mm_struct *mm)
 {
 	destroy_context_ldt(mm);
+	free_pasid(mm);
 }
 
 extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 8a0cf2f0dd54..4c788880b037 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -425,6 +425,69 @@ int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
 	return ret;
 }
 
+static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev,
+		      bool new_pasid)
+{
+	if (new_pasid)
+		ioasid_free(svm->pasid);
+	kfree(svm);
+	kfree(sdev);
+}
+
+/*
+ * If this mm already has a PASID, use it. Otherwise allocate a new one.
+ * Let the caller know if a new PASID is allocated via 'new_pasid'.
+ */
+static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
+		       u32 pasid_max, bool *new_pasid,
+		       unsigned int flags)
+{
+	u32 pasid;
+
+	*new_pasid = false;
+
+	/*
+	 * Reuse the PASID if the mm already has a PASID and not a private
+	 * PASID is requested.
+	 */
+	if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
+		void *p;
+
+		/*
+		 * Since the mm has a PASID already, the PASID should be
+		 * bound and unbound to the mm before calling this allocation.
+		 * So the PASID must be allocated by bind_mm() previously and
+		 * should still exist in ioasid; but its data must be cleared
+		 * already by unbind_mm().
+		 *
+		 * Do a sanity check here to ensure the PASID has the right
+		 * status before reusing it.
+		 */
+		p = ioasid_find(NULL, mm->pasid, NULL);
+		if (IS_ERR(p) || p)
+			return INVALID_IOASID;
+
+		/*
+		 * Once the PASID is allocated for this mm, it
+		 * stays with the mm until the mm is dropped. Reuse
+		 * the PASID which has been already allocated for the
+		 * mm instead of allocating a new one.
+		 */
+		ioasid_set_data(mm->pasid, svm);
+
+		return mm->pasid;
+	}
+
+	/* Allocate a new pasid. Do not use PASID 0, reserved for init PASID. */
+	pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, svm);
+	if (pasid != INVALID_IOASID) {
+		/* A new pasid is allocated. */
+		*new_pasid = true;
+	}
+
+	return pasid;
+}
+
 /* Caller must hold pasid_mutex, mm reference */
 static int
 intel_svm_bind_mm(struct device *dev, unsigned int flags,
@@ -518,6 +581,8 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
 	init_rcu_head(&sdev->rcu);
 
 	if (!svm) {
+		bool new_pasid;
+
 		svm = kzalloc(sizeof(*svm), GFP_KERNEL);
 		if (!svm) {
 			ret = -ENOMEM;
@@ -529,12 +594,9 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
 		if (pasid_max > intel_pasid_max_id)
 			pasid_max = intel_pasid_max_id;
 
-		/* Do not use PASID 0, reserved for RID to PASID */
-		svm->pasid = ioasid_alloc(NULL, PASID_MIN,
-					  pasid_max - 1, svm);
+		svm->pasid = alloc_pasid(svm, mm, pasid_max, &new_pasid, flags);
 		if (svm->pasid == INVALID_IOASID) {
-			kfree(svm);
-			kfree(sdev);
+			free_bind(svm, sdev, new_pasid);
 			ret = -ENOSPC;
 			goto out;
 		}
@@ -547,9 +609,7 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
 		if (mm) {
 			ret = mmu_notifier_register(&svm->notifier, mm);
 			if (ret) {
-				ioasid_free(svm->pasid);
-				kfree(svm);
-				kfree(sdev);
+				free_bind(svm, sdev, new_pasid);
 				goto out;
 			}
 		}
@@ -565,12 +625,20 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
 		if (ret) {
 			if (mm)
 				mmu_notifier_unregister(&svm->notifier, mm);
-			ioasid_free(svm->pasid);
-			kfree(svm);
-			kfree(sdev);
+			free_bind(svm, sdev, new_pasid);
 			goto out;
 		}
 
+		if (mm && new_pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
+			/*
+			 * Track the new pasid in the mm. The pasid will be
+			 * freed at process exit.
+			 *
+			 * The private PASID feature will be removed soon from
+			 * IOMMU. Don't track requested private PASID in the mm.
+			 */
+			mm->pasid = svm->pasid;
+		}
 		list_add_tail(&svm->list, &global_svm_list);
 	} else {
 		/*
@@ -640,7 +708,8 @@ static int intel_svm_unbind_mm(struct device *dev, u32 pasid)
 			kfree_rcu(sdev, rcu);
 
 			if (list_empty(&svm->devs)) {
-				ioasid_free(svm->pasid);
+				/* Clear data in the pasid. */
+				ioasid_set_data(pasid, NULL);
 				if (svm->mm)
 					mmu_notifier_unregister(&svm->notifier, svm->mm);
 				list_del(&svm->list);
@@ -1001,3 +1070,38 @@ u32 intel_svm_get_pasid(struct iommu_sva *sva)
 
 	return pasid;
 }
+
+/*
+ * An invalid pasid is either 0 (init PASID value) or bigger than max PASID
+ * (PASID_MAX - 1).
+ */
+static bool invalid_pasid(u32 pasid)
+{
+	return (pasid == INIT_PASID) || (pasid >= PASID_MAX);
+}
+
+/* On process exit free the PASID (if one was allocated). */
+void __free_pasid(struct mm_struct *mm)
+{
+	u32 pasid = mm->pasid;
+	void *p;
+
+	/* No need to free invalid pasid. */
+	if (invalid_pasid(pasid))
+		return;
+
+	/* The pasid shouldn't be bound to any mm by now. */
+	p = ioasid_find(NULL, pasid, NULL);
+	if (!IS_ERR_OR_NULL(p)) {
+		pr_err("PASID %d is still in use\n", pasid);
+
+		return;
+	}
+
+	/*
+	 * Since the pasid is not bound to any svm, there is no race
+	 * here with binding/unbinding and no need to protect the free
+	 * operation by pasid_mutex.
+	 */
+	ioasid_free(pasid);
+}
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 11/12] sched: Define and initialize a flag to identify valid PASID in the task
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
                   ` (9 preceding siblings ...)
  2020-07-13 23:48 ` [PATCH v6 10/12] x86/mmu: Allocate/free PASID Fenghua Yu
@ 2020-07-13 23:48 ` Fenghua Yu
  2020-07-13 23:48 ` [PATCH v6 12/12] x86/traps: Fix up invalid PASID Fenghua Yu
  11 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:48 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

From: Peter Zijlstra <peterz@infradead.org>

The flag is defined for the task to identify if the task has a valid
PASID. Its initial value is 0 when the task is forked/cloned. It will
be used shortly.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
v2:
- Add this patch to define the flag to identify valid PASID MSR (PeterZ)

 include/linux/sched.h | 3 +++
 kernel/fork.c         | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 692e327d7455..042d6f5cde6a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -800,6 +800,9 @@ struct task_struct {
 	/* Stalled due to lack of memory */
 	unsigned			in_memstall:1;
 #endif
+#ifdef CONFIG_IOMMU_SUPPORT
+	unsigned			has_valid_pasid:1;
+#endif
 
 	unsigned long			atomic_flags; /* Flags requiring atomic access. */
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 43b5f112604d..0a962bebdf88 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	tsk->use_memdelay = 0;
 #endif
 
+#ifdef CONFIG_IOMMU_SUPPORT
+	tsk->has_valid_pasid = 0;
+#endif
+
 #ifdef CONFIG_MEMCG
 	tsk->active_memcg = NULL;
 #endif
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
                   ` (10 preceding siblings ...)
  2020-07-13 23:48 ` [PATCH v6 11/12] sched: Define and initialize a flag to identify valid PASID in the task Fenghua Yu
@ 2020-07-13 23:48 ` Fenghua Yu
  2020-07-31 23:34   ` Andy Lutomirski
  2020-08-01  1:28   ` Andy Lutomirski
  11 siblings, 2 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-13 23:48 UTC (permalink / raw)
  To: Thomas Gleixner, Joerg Roedel, Ingo Molnar, Borislav Petkov,
	Peter Zijlstra, H Peter Anvin, David Woodhouse, Lu Baolu,
	Felix Kuehling, Dave Hansen, Tony Luck, Jean-Philippe Brucker,
	Christoph Hellwig, Ashok Raj, Jacob Jun Pan, Dave Jiang,
	Sohil Mehta, Ravi V Shankar
  Cc: Fenghua Yu, iommu, x86, linux-kernel, amd-gfx

A #GP fault is generated when ENQCMD instruction is executed without
a valid PASID value programmed in the current thread's PASID MSR. The
#GP fault handler will initialize the MSR if a PASID has been allocated
for this process.

Decoding the user instruction is ugly and sets a bad architecture
precedent. It may not function if the faulting instruction is modified
after #GP.

Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
without a valid PASID value programmed. #GP error codes are 16 bits and all
16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
when loading from the source operand, so its not fully comprehending all
the reasons. Rather than special case the ENQCMD, in future Intel may
choose a different fault mechanism for such cases if recovery is needed on
#GP.

The following heuristic is used to avoid decoding the user instructions
to determine the precise reason for the #GP fault:
1) If the mm for the process has not been allocated a PASID, this #GP
   cannot be fixed.
2) If the PASID MSR is already initialized, then the #GP was for some
   other reason
3) Try initializing the PASID MSR and returning. If the #GP was from
   an ENQCMD this will fix it. If not, the #GP fault will be repeated
   and will hit case "2".

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
---
v5:
- Use cpu_feature_enabled() and remove redundant CONFIG_INTEL_IOMMU_SVM
  check which is included in cpu_feature_enabled() in
  fixup_pasid_exception() (PeterZ and Dave Hansen)
- Reviewed by Lu Baolu

v4:
- Change PASID type to u32 (Christoph)

v3:
- Check and set current->has_valid_pasid in fixup() (PeterZ)
- Add fpu__pasid_write() to update the MSR (PeterZ)
- Add ioasid_find() sanity check in fixup()

v2:
- Update the first paragraph of the commit message (Thomas)
- Add reasons why don't decode the user instruction and don't use
  #GP error code (Thomas)
- Change get_task_mm() to current->mm (Thomas)
- Add comments on why IRQ is disabled during PASID fixup (Thomas)
- Add comment in fixup() that the function is called when #GP is from
  user (so mm is not NULL) (Dave Hansen)

 arch/x86/include/asm/iommu.h |  1 +
 arch/x86/kernel/traps.c      | 12 ++++++
 drivers/iommu/intel/svm.c    | 78 ++++++++++++++++++++++++++++++++++++
 3 files changed, 91 insertions(+)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index ed41259fe7ac..e9365a5d6f7d 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -27,5 +27,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
 }
 
 void __free_pasid(struct mm_struct *mm);
+bool __fixup_pasid_exception(void);
 
 #endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index f58679e487f6..fe0f7d00523b 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
 #include <asm/umip.h>
 #include <asm/insn.h>
 #include <asm/insn-eval.h>
+#include <asm/iommu.h>
 
 #ifdef CONFIG_X86_64
 #include <asm/x86_init.h>
@@ -518,6 +519,14 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
 	return GP_CANONICAL;
 }
 
+static bool fixup_pasid_exception(void)
+{
+	if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+		return false;
+
+	return __fixup_pasid_exception();
+}
+
 #define GPFSTR "general protection fault"
 
 DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
@@ -530,6 +539,9 @@ DEFINE_IDTENTRY_ERRORCODE(exc_general_protection)
 
 	cond_local_irq_enable(regs);
 
+	if (user_mode(regs) && fixup_pasid_exception())
+		goto exit;
+
 	if (static_cpu_has(X86_FEATURE_UMIP)) {
 		if (user_mode(regs) && fixup_umip_exception(regs))
 			goto exit;
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4c788880b037..4a84c82a4f8c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1105,3 +1105,81 @@ void __free_pasid(struct mm_struct *mm)
 	 */
 	ioasid_free(pasid);
 }
+
+/*
+ * Write the current task's PASID MSR/state. This is called only when PASID
+ * is enabled.
+ */
+static void fpu__pasid_write(u32 pasid)
+{
+	u64 msr_val = pasid | MSR_IA32_PASID_VALID;
+
+	fpregs_lock();
+
+	/*
+	 * If the MSR is active and owned by the current task's FPU, it can
+	 * be directly written.
+	 *
+	 * Otherwise, write the fpstate.
+	 */
+	if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
+		wrmsrl(MSR_IA32_PASID, msr_val);
+	} else {
+		struct ia32_pasid_state *ppasid_state;
+
+		ppasid_state = get_xsave_addr(&current->thread.fpu.state.xsave,
+					      XFEATURE_PASID);
+		/*
+		 * ppasid_state shouldn't be NULL because XFEATURE_PASID
+		 * is enabled.
+		 */
+		WARN_ON_ONCE(!ppasid_state);
+		ppasid_state->pasid = msr_val;
+	}
+	fpregs_unlock();
+}
+
+/*
+ * Apply some heuristics to see if the #GP fault was caused by a thread
+ * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
+ * is the problem, try initializing the IA32_PASID MSR. If the heuristic
+ * guesses incorrectly, take one more #GP fault.
+ */
+bool __fixup_pasid_exception(void)
+{
+	struct intel_svm *svm;
+	struct mm_struct *mm;
+	u32 pasid;
+
+	/*
+	 * If the current task already has a valid PASID in the MSR,
+	 * the #GP must be for some other reason.
+	 */
+	if (current->has_valid_pasid)
+		return false;
+
+	/*
+	 * This function is called only when this #GP was triggered from user
+	 * space. So the mm cannot be NULL.
+	 */
+	mm = current->mm;
+	pasid = mm->pasid;
+
+	/*
+	 * If the PASID isn't found, cannot help.
+	 *
+	 * Don't care if the PASID is bound to the mm here. #PF will handle the
+	 * case that the MSR is fixed up by an unbound PASID.
+	 */
+	svm = ioasid_find(NULL, pasid, NULL);
+	if (IS_ERR_OR_NULL(svm))
+		return false;
+
+	/* Fix up the MSR by the PASID in the mm. */
+	fpu__pasid_write(pasid);
+
+	/* Now the current task has a valid PASID in the MSR. */
+	current->has_valid_pasid = 1;
+
+	return true;
+}
-- 
2.19.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* RE: [PATCH v6 01/12] iommu: Change type of pasid to u32
  2020-07-13 23:47 ` [PATCH v6 01/12] iommu: Change type of pasid to u32 Fenghua Yu
@ 2020-07-14  2:45   ` Liu, Yi L
  2020-07-14 13:54     ` Fenghua Yu
  2020-07-22 14:03   ` Joerg Roedel
  1 sibling, 1 reply; 36+ messages in thread
From: Liu, Yi L @ 2020-07-14  2:45 UTC (permalink / raw)
  To: Yu, Fenghua, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, H Peter Anvin, David Woodhouse,
	Lu Baolu, Felix Kuehling, Hansen, Dave, Luck, Tony,
	Jean-Philippe Brucker, Christoph Hellwig, Raj, Ashok, Pan,
	Jacob jun, Jiang, Dave, Mehta, Sohil, Shankar, Ravi V
  Cc: Yu, Fenghua, iommu, x86, linux-kernel, amd-gfx

> From: Fenghua Yu <fenghua.yu@intel.com>
> Sent: Tuesday, July 14, 2020 7:48 AM
> 
> PASID is defined as a few different types in iommu including "int",
> "u32", and "unsigned int". To be consistent and to match with uapi
> definitions, define PASID and its variations (e.g. max PASID) as "u32".
> "u32" is also shorter and a little more explicit than "unsigned int".
> 
> No PASID type change in uapi although it defines PASID as __u64 in
> some places.

just out of curious, why not using ioasid_t? In Linux kernel, PASID is
managed by ioasid.

Regards,
Yi Liu

> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
> v6:
> - Change return type to u32 for kfd_pasid_alloc() (Felix)
> 
> v5:
> - Reviewed by Lu Baolu
> 
> v4:
> - Change PASID type from "unsigned int" to "u32" (Christoph)
> 
> v2:
> - Create this new patch to define PASID as "unsigned int" consistently in
>   iommu (Thomas)
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  4 +--
>  .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c    |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  2 +-
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |  4 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c       |  6 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h       |  4 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        |  8 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h        |  8 ++---
>  .../gpu/drm/amd/amdkfd/cik_event_interrupt.c  |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c       |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h       |  2 +-
>  .../drm/amd/amdkfd/kfd_device_queue_manager.c |  7 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c       |  8 ++---
>  drivers/gpu/drm/amd/amdkfd/kfd_events.h       |  4 +--
>  drivers/gpu/drm/amd/amdkfd/kfd_iommu.c        |  6 ++--
>  drivers/gpu/drm/amd/amdkfd/kfd_pasid.c        |  4 +--
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h         | 20 ++++++------
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c      |  2 +-
>  .../gpu/drm/amd/include/kgd_kfd_interface.h   |  2 +-
>  drivers/iommu/amd/amd_iommu.h                 | 10 +++---
>  drivers/iommu/amd/iommu.c                     | 31 ++++++++++---------
>  drivers/iommu/amd/iommu_v2.c                  | 20 ++++++------
>  drivers/iommu/intel/dmar.c                    |  7 +++--
>  drivers/iommu/intel/intel-pasid.h             | 24 +++++++-------
>  drivers/iommu/intel/iommu.c                   |  4 +--
>  drivers/iommu/intel/pasid.c                   | 31 +++++++++----------
>  drivers/iommu/intel/svm.c                     | 12 +++----
>  drivers/iommu/iommu.c                         |  2 +-
>  drivers/misc/uacce/uacce.c                    |  2 +-
>  include/linux/amd-iommu.h                     |  8 ++---
>  include/linux/intel-iommu.h                   | 12 +++----
>  include/linux/intel-svm.h                     |  2 +-
>  include/linux/iommu.h                         | 10 +++---
>  include/linux/uacce.h                         |  2 +-
>  38 files changed, 141 insertions(+), 141 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index ffe149aafc39..dfef5a7e0f5a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -207,11 +207,11 @@ uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct
> kgd_dev *dst, struct kgd_dev *s
>  	})
> 
>  /* GPUVM API */
> -int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int
> pasid,
> +int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
>  					void **vm, void **process_info,
>  					struct dma_fence **ef);
>  int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
> -					struct file *filp, unsigned int pasid,
> +					struct file *filp, u32 pasid,
>  					void **vm, void **process_info,
>  					struct dma_fence **ef);
>  void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> index bf927f432506..ee531c3988d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
> @@ -105,7 +105,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev
> *kgd, uint32_t vmid,
>  	unlock_srbm(kgd);
>  }
> 
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
>  					unsigned int vmid)
>  {
>  	struct amdgpu_device *adev = get_amdgpu_device(kgd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> index 744366c7ee85..4d41317b9292 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> @@ -139,7 +139,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev
> *kgd, uint32_t vmid,
>  	unlock_srbm(kgd);
>  }
> 
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
>  					unsigned int vmid)
>  {
>  	struct amdgpu_device *adev = get_amdgpu_device(kgd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> index feab4cc6e836..35917d4b50f6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> @@ -96,7 +96,7 @@ static void kgd_program_sh_mem_settings(struct kgd_dev
> *kgd, uint32_t vmid,
>  	unlock_srbm(kgd);
>  }
> 
> -static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
> +static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
>  					unsigned int vmid)
>  {
>  	struct amdgpu_device *adev = get_amdgpu_device(kgd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index c7fd0c47b254..b4b88e4da4de 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> @@ -110,7 +110,7 @@ void kgd_gfx_v9_program_sh_mem_settings(struct kgd_dev
> *kgd, uint32_t vmid,
>  	unlock_srbm(kgd);
>  }
> 
> -int kgd_gfx_v9_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
> +int kgd_gfx_v9_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
>  					unsigned int vmid)
>  {
>  	struct amdgpu_device *adev = get_amdgpu_device(kgd);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
> index aedf67d57449..ff2bc72e6646 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
> @@ -26,7 +26,7 @@ void kgd_gfx_v9_program_sh_mem_settings(struct kgd_dev
> *kgd, uint32_t vmid,
>  		uint32_t sh_mem_config,
>  		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit,
>  		uint32_t sh_mem_bases);
> -int kgd_gfx_v9_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
> +int kgd_gfx_v9_set_pasid_vmid_mapping(struct kgd_dev *kgd, u32 pasid,
>  		unsigned int vmid);
>  int kgd_gfx_v9_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
>  int kgd_gfx_v9_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index b91b5171270f..9d06366320bc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -994,7 +994,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void
> **process_info,
>  	return ret;
>  }
> 
> -int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, unsigned int
> pasid,
> +int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, u32 pasid,
>  					  void **vm, void **process_info,
>  					  struct dma_fence **ef)
>  {
> @@ -1030,7 +1030,7 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct
> kgd_dev *kgd, unsigned int pasi
>  }
> 
>  int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
> -					   struct file *filp, unsigned int pasid,
> +					   struct file *filp, u32 pasid,
>  					   void **vm, void **process_info,
>  					   struct dma_fence **ef)
>  {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> index fe92dcd94d4a..3f697e9010d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
> @@ -43,7 +43,7 @@ static DEFINE_IDA(amdgpu_pasid_ida);
>  /* Helper to free pasid from a fence callback */
>  struct amdgpu_pasid_cb {
>  	struct dma_fence_cb cb;
> -	unsigned int pasid;
> +	u32 pasid;
>  };
> 
>  /**
> @@ -79,7 +79,7 @@ int amdgpu_pasid_alloc(unsigned int bits)
>   * amdgpu_pasid_free - Free a PASID
>   * @pasid: PASID to free
>   */
> -void amdgpu_pasid_free(unsigned int pasid)
> +void amdgpu_pasid_free(u32 pasid)
>  {
>  	trace_amdgpu_pasid_freed(pasid);
>  	ida_simple_remove(&amdgpu_pasid_ida, pasid);
> @@ -105,7 +105,7 @@ static void amdgpu_pasid_free_cb(struct dma_fence *fence,
>   * Free the pasid only after all the fences in resv are signaled.
>   */
>  void amdgpu_pasid_free_delayed(struct dma_resv *resv,
> -			       unsigned int pasid)
> +			       u32 pasid)
>  {
>  	struct dma_fence *fence, **fences;
>  	struct amdgpu_pasid_cb *cb;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
> index 8e58325bbca2..0c3b4fa1f936 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
> @@ -71,9 +71,9 @@ struct amdgpu_vmid_mgr {
>  };
> 
>  int amdgpu_pasid_alloc(unsigned int bits);
> -void amdgpu_pasid_free(unsigned int pasid);
> +void amdgpu_pasid_free(u32 pasid);
>  void amdgpu_pasid_free_delayed(struct dma_resv *resv,
> -			       unsigned int pasid);
> +			       u32 pasid);
> 
>  bool amdgpu_vmid_had_gpu_reset(struct amdgpu_device *adev,
>  			       struct amdgpu_vmid *id);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index d7e17e34fee1..8503cce467eb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -1062,7 +1062,7 @@ void amdgpu_driver_postclose_kms(struct drm_device
> *dev,
>  	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
>  	struct amdgpu_bo_list *list;
>  	struct amdgpu_bo *pd;
> -	unsigned int pasid;
> +	u32 pasid;
>  	int handle;
> 
>  	if (!fpriv)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 7417754e9141..ada2534dd8d9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2783,7 +2783,7 @@ long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long
> timeout)
>   * 0 for success, error for failure.
>   */
>  int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> -		   int vm_context, unsigned int pasid)
> +		   int vm_context, u32 pasid)
>  {
>  	struct amdgpu_bo_param bp;
>  	struct amdgpu_bo *root;
> @@ -2954,7 +2954,7 @@ static int amdgpu_vm_check_clean_reserved(struct
> amdgpu_device *adev,
>   * 0 for success, -errno for errors.
>   */
>  int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm
> *vm,
> -			   unsigned int pasid)
> +			   u32 pasid)
>  {
>  	bool pte_support_ats = (adev->asic_type == CHIP_RAVEN);
>  	int r;
> @@ -3252,7 +3252,7 @@ int amdgpu_vm_ioctl(struct drm_device *dev, void *data,
> struct drm_file *filp)
>   * @pasid: PASID identifier for VM
>   * @task_info: task_info to fill.
>   */
> -void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
> +void amdgpu_vm_get_task_info(struct amdgpu_device *adev, u32 pasid,
>  			 struct amdgpu_task_info *task_info)
>  {
>  	struct amdgpu_vm *vm;
> @@ -3296,7 +3296,7 @@ void amdgpu_vm_set_task_info(struct amdgpu_vm *vm)
>   * Try to gracefully handle a VM fault. Return true if the fault was handled and
>   * shouldn't be reported any more.
>   */
> -bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
>  			    uint64_t addr)
>  {
>  	struct amdgpu_bo *root;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index c8e68d7890bf..c2f23b071f14 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -371,8 +371,8 @@ void amdgpu_vm_manager_fini(struct amdgpu_device
> *adev);
> 
>  long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long timeout);
>  int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
> -		   int vm_context, unsigned int pasid);
> -int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm
> *vm, unsigned int pasid);
> +		   int vm_context, u32 pasid);
> +int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm
> *vm, u32 pasid);
>  void amdgpu_vm_release_compute(struct amdgpu_device *adev, struct
> amdgpu_vm *vm);
>  void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
>  void amdgpu_vm_get_pd_bo(struct amdgpu_vm *vm,
> @@ -429,9 +429,9 @@ bool amdgpu_vm_need_pipeline_sync(struct amdgpu_ring
> *ring,
>  				  struct amdgpu_job *job);
>  void amdgpu_vm_check_compute_bug(struct amdgpu_device *adev);
> 
> -void amdgpu_vm_get_task_info(struct amdgpu_device *adev, unsigned int pasid,
> +void amdgpu_vm_get_task_info(struct amdgpu_device *adev, u32 pasid,
>  			     struct amdgpu_task_info *task_info);
> -bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, unsigned int pasid,
> +bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, u32 pasid,
>  			    uint64_t addr);
> 
>  void amdgpu_vm_set_task_info(struct amdgpu_vm *vm);
> diff --git a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> index 9f59ba93cfe0..c2fd30045ad5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> +++ b/drivers/gpu/drm/amd/amdkfd/cik_event_interrupt.c
> @@ -90,7 +90,7 @@ static void cik_event_interrupt_wq(struct kfd_dev *dev,
>  			(const struct cik_ih_ring_entry *)ih_ring_entry;
>  	uint32_t context_id = ihre->data & 0xfffffff;
>  	unsigned int vmid  = (ihre->ring_id & 0x0000ff00) >> 8;
> -	unsigned int pasid = (ihre->ring_id & 0xffff0000) >> 16;
> +	u32 pasid = (ihre->ring_id & 0xffff0000) >> 16;
> 
>  	if (pasid == 0)
>  		return;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> index 27bcc5b472f6..b258a3dae767 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> @@ -45,7 +45,7 @@ static void dbgdev_address_watch_disable_nodiq(struct
> kfd_dev *dev)
>  }
> 
>  static int dbgdev_diq_submit_ib(struct kfd_dbgdev *dbgdev,
> -				unsigned int pasid, uint64_t vmid0_address,
> +				u32 pasid, uint64_t vmid0_address,
>  				uint32_t *packet_buff, size_t size_in_bytes)
>  {
>  	struct pm4__release_mem *rm_packet;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h
> b/drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h
> index a04a1fe1d0d9..f9c6df1fdc5c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgmgr.h
> @@ -275,7 +275,7 @@ struct kfd_dbgdev {
>  };
> 
>  struct kfd_dbgmgr {
> -	unsigned int pasid;
> +	u32 pasid;
>  	struct kfd_dev *dev;
>  	struct kfd_dbgdev *dbgdev;
>  };
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index e9c4867abeff..9571a6e5de4c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -40,7 +40,7 @@
>  #define CIK_HPD_EOP_BYTES (1U << CIK_HPD_EOP_BYTES_LOG2)
> 
>  static int set_pasid_vmid_mapping(struct device_queue_manager *dqm,
> -					unsigned int pasid, unsigned int vmid);
> +				  u32 pasid, unsigned int vmid);
> 
>  static int execute_queues_cpsch(struct device_queue_manager *dqm,
>  				enum kfd_unmap_queues_filter filter,
> @@ -909,7 +909,7 @@ static int unregister_process(struct device_queue_manager
> *dqm,
>  }
> 
>  static int
> -set_pasid_vmid_mapping(struct device_queue_manager *dqm, unsigned int pasid,
> +set_pasid_vmid_mapping(struct device_queue_manager *dqm, u32 pasid,
>  			unsigned int vmid)
>  {
>  	return dqm->dev->kfd2kgd->set_pasid_vmid_mapping(
> @@ -1925,8 +1925,7 @@ void device_queue_manager_uninit(struct
> device_queue_manager *dqm)
>  	kfree(dqm);
>  }
> 
> -int kfd_process_vm_fault(struct device_queue_manager *dqm,
> -			 unsigned int pasid)
> +int kfd_process_vm_fault(struct device_queue_manager *dqm, u32 pasid)
>  {
>  	struct kfd_process_device *pdd;
>  	struct kfd_process *p = kfd_lookup_process_by_pasid(pasid);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index a9583b95fcc1..ba2c2ce0c55a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -460,7 +460,7 @@ static void set_event_from_interrupt(struct kfd_process *p,
>  	}
>  }
> 
> -void kfd_signal_event_interrupt(unsigned int pasid, uint32_t partial_id,
> +void kfd_signal_event_interrupt(u32 pasid, uint32_t partial_id,
>  				uint32_t valid_id_bits)
>  {
>  	struct kfd_event *ev = NULL;
> @@ -872,7 +872,7 @@ static void lookup_events_by_type_and_signal(struct
> kfd_process *p,
>  }
> 
>  #ifdef KFD_SUPPORT_IOMMU_V2
> -void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
> +void kfd_signal_iommu_event(struct kfd_dev *dev, u32 pasid,
>  		unsigned long address, bool is_write_requested,
>  		bool is_execute_requested)
>  {
> @@ -950,7 +950,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned
> int pasid,
>  }
>  #endif /* KFD_SUPPORT_IOMMU_V2 */
> 
> -void kfd_signal_hw_exception_event(unsigned int pasid)
> +void kfd_signal_hw_exception_event(u32 pasid)
>  {
>  	/*
>  	 * Because we are called from arbitrary context (workqueue) as opposed
> @@ -971,7 +971,7 @@ void kfd_signal_hw_exception_event(unsigned int pasid)
>  	kfd_unref_process(p);
>  }
> 
> -void kfd_signal_vm_fault_event(struct kfd_dev *dev, unsigned int pasid,
> +void kfd_signal_vm_fault_event(struct kfd_dev *dev, u32 pasid,
>  				struct kfd_vm_fault_info *info)
>  {
>  	struct kfd_event *ev;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.h
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.h
> index c7ac6c73af86..c8fe5dbdad55 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.h
> @@ -79,7 +79,7 @@ struct kfd_event {
>  #define KFD_EVENT_TYPE_DEBUG 5
>  #define KFD_EVENT_TYPE_MEMORY 8
> 
> -extern void kfd_signal_event_interrupt(unsigned int pasid, uint32_t partial_id,
> -					uint32_t valid_id_bits);
> +extern void kfd_signal_event_interrupt(u32 pasid, uint32_t partial_id,
> +				       uint32_t valid_id_bits);
> 
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
> index 7c8786b9eb0a..e8ef3886688b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
> @@ -139,7 +139,7 @@ void kfd_iommu_unbind_process(struct kfd_process *p)
>  }
> 
>  /* Callback for process shutdown invoked by the IOMMU driver */
> -static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
> +static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, u32 pasid)
>  {
>  	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
>  	struct kfd_process *p;
> @@ -185,8 +185,8 @@ static void iommu_pasid_shutdown_callback(struct pci_dev
> *pdev, int pasid)
>  }
> 
>  /* This function called by IOMMU driver on PPR failure */
> -static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
> -		unsigned long address, u16 flags)
> +static int iommu_invalid_ppr_cb(struct pci_dev *pdev, u32 pasid,
> +				unsigned long address, u16 flags)
>  {
>  	struct kfd_dev *dev;
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
> index 33b08ff00b50..c19a2e6fd7c8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pasid.c
> @@ -51,7 +51,7 @@ unsigned int kfd_get_pasid_limit(void)
>  	return 1U << pasid_bits;
>  }
> 
> -unsigned int kfd_pasid_alloc(void)
> +u32 kfd_pasid_alloc(void)
>  {
>  	int r;
> 
> @@ -77,7 +77,7 @@ unsigned int kfd_pasid_alloc(void)
>  	return r > 0 ? r : 0;
>  }
> 
> -void kfd_pasid_free(unsigned int pasid)
> +void kfd_pasid_free(u32 pasid)
>  {
>  	if (kfd2kgd)
>  		amdgpu_pasid_free(pasid);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index fee60921fccf..09db037bb48f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -715,7 +715,7 @@ struct kfd_process {
>  	/* We want to receive a notification when the mm_struct is destroyed */
>  	struct mmu_notifier mmu_notifier;
> 
> -	uint16_t pasid;
> +	u32 pasid;
>  	unsigned int doorbell_index;
> 
>  	/*
> @@ -790,7 +790,7 @@ int kfd_process_create_wq(void);
>  void kfd_process_destroy_wq(void);
>  struct kfd_process *kfd_create_process(struct file *filep);
>  struct kfd_process *kfd_get_process(const struct task_struct *);
> -struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid);
> +struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid);
>  struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm);
>  void kfd_unref_process(struct kfd_process *p);
>  int kfd_process_evict_queues(struct kfd_process *p);
> @@ -831,8 +831,8 @@ int kfd_pasid_init(void);
>  void kfd_pasid_exit(void);
>  bool kfd_set_pasid_limit(unsigned int new_limit);
>  unsigned int kfd_get_pasid_limit(void);
> -unsigned int kfd_pasid_alloc(void);
> -void kfd_pasid_free(unsigned int pasid);
> +u32 kfd_pasid_alloc(void);
> +void kfd_pasid_free(u32 pasid);
> 
>  /* Doorbells */
>  size_t kfd_doorbell_process_slice(struct kfd_dev *kfd);
> @@ -917,7 +917,7 @@ void device_queue_manager_uninit(struct
> device_queue_manager *dqm);
>  struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
>  					enum kfd_queue_type type);
>  void kernel_queue_uninit(struct kernel_queue *kq, bool hanging);
> -int kfd_process_vm_fault(struct device_queue_manager *dqm, unsigned int pasid);
> +int kfd_process_vm_fault(struct device_queue_manager *dqm, u32 pasid);
> 
>  /* Process Queue Manager */
>  struct process_queue_node {
> @@ -1039,12 +1039,12 @@ int kfd_wait_on_events(struct kfd_process *p,
>  		       uint32_t num_events, void __user *data,
>  		       bool all, uint32_t user_timeout_ms,
>  		       uint32_t *wait_result);
> -void kfd_signal_event_interrupt(unsigned int pasid, uint32_t partial_id,
> +void kfd_signal_event_interrupt(u32 pasid, uint32_t partial_id,
>  				uint32_t valid_id_bits);
>  void kfd_signal_iommu_event(struct kfd_dev *dev,
> -		unsigned int pasid, unsigned long address,
> -		bool is_write_requested, bool is_execute_requested);
> -void kfd_signal_hw_exception_event(unsigned int pasid);
> +			    u32 pasid, unsigned long address,
> +			    bool is_write_requested, bool is_execute_requested);
> +void kfd_signal_hw_exception_event(u32 pasid);
>  int kfd_set_event(struct kfd_process *p, uint32_t event_id);
>  int kfd_reset_event(struct kfd_process *p, uint32_t event_id);
>  int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
> @@ -1055,7 +1055,7 @@ int kfd_event_create(struct file *devkfd, struct
> kfd_process *p,
>  		     uint64_t *event_page_offset, uint32_t *event_slot_index);
>  int kfd_event_destroy(struct kfd_process *p, uint32_t event_id);
> 
> -void kfd_signal_vm_fault_event(struct kfd_dev *dev, unsigned int pasid,
> +void kfd_signal_vm_fault_event(struct kfd_dev *dev, u32 pasid,
>  				struct kfd_vm_fault_info *info);
> 
>  void kfd_signal_reset_event(struct kfd_dev *dev);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 0e0c42e9f6a3..7567647e976d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1087,7 +1087,7 @@ void kfd_process_device_remove_obj_handle(struct
> kfd_process_device *pdd,
>  }
> 
>  /* This increments the process->ref counter. */
> -struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid)
> +struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid)
>  {
>  	struct kfd_process *p, *ret_p = NULL;
>  	unsigned int temp;
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index a3c238c39ef5..301de493377a 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -226,7 +226,7 @@ struct kfd2kgd_calls {
>  			uint32_t sh_mem_config,	uint32_t
> sh_mem_ape1_base,
>  			uint32_t sh_mem_ape1_limit, uint32_t sh_mem_bases);
> 
> -	int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, unsigned int pasid,
> +	int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, u32 pasid,
>  					unsigned int vmid);
> 
>  	int (*init_interrupts)(struct kgd_dev *kgd, uint32_t pipe_id);
> diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
> index f892992c8744..1fae58dd3c25 100644
> --- a/drivers/iommu/amd/amd_iommu.h
> +++ b/drivers/iommu/amd/amd_iommu.h
> @@ -45,12 +45,12 @@ extern int amd_iommu_register_ppr_notifier(struct
> notifier_block *nb);
>  extern int amd_iommu_unregister_ppr_notifier(struct notifier_block *nb);
>  extern void amd_iommu_domain_direct_map(struct iommu_domain *dom);
>  extern int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids);
> -extern int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
> +extern int amd_iommu_flush_page(struct iommu_domain *dom, u32 pasid,
>  				u64 address);
> -extern int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid);
> -extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
> +extern int amd_iommu_flush_tlb(struct iommu_domain *dom, u32 pasid);
> +extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, u32 pasid,
>  				     unsigned long cr3);
> -extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid);
> +extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, u32
> pasid);
>  extern struct iommu_domain *amd_iommu_get_v2_domain(struct pci_dev *pdev);
> 
>  #ifdef CONFIG_IRQ_REMAP
> @@ -66,7 +66,7 @@ static inline int amd_iommu_create_irq_domain(struct
> amd_iommu *iommu)
>  #define PPR_INVALID			0x1
>  #define PPR_FAILURE			0xf
> 
> -extern int amd_iommu_complete_ppr(struct pci_dev *pdev, int pasid,
> +extern int amd_iommu_complete_ppr(struct pci_dev *pdev, u32 pasid,
>  				  int status, int tag);
> 
>  static inline bool is_rd890_iommu(struct pci_dev *pdev)
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 74cca1757172..8398a38af719 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -502,10 +502,11 @@ static void amd_iommu_report_page_fault(u16 devid,
> u16 domain_id,
>  static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
>  {
>  	struct device *dev = iommu->iommu.dev;
> -	int type, devid, pasid, flags, tag;
> +	int type, devid, flags, tag;
>  	volatile u32 *event = __evt;
>  	int count = 0;
>  	u64 address;
> +	u32 pasid;
> 
>  retry:
>  	type    = (event[1] >> EVENT_TYPE_SHIFT)  & EVENT_TYPE_MASK;
> @@ -898,7 +899,7 @@ static void build_inv_iotlb_pages(struct iommu_cmd *cmd,
> u16 devid, int qdep,
>  		cmd->data[2] |= CMD_INV_IOMMU_PAGES_SIZE_MASK;
>  }
> 
> -static void build_inv_iommu_pasid(struct iommu_cmd *cmd, u16 domid, int pasid,
> +static void build_inv_iommu_pasid(struct iommu_cmd *cmd, u16 domid, u32 pasid,
>  				  u64 address, bool size)
>  {
>  	memset(cmd, 0, sizeof(*cmd));
> @@ -916,7 +917,7 @@ static void build_inv_iommu_pasid(struct iommu_cmd *cmd,
> u16 domid, int pasid,
>  	CMD_SET_TYPE(cmd, CMD_INV_IOMMU_PAGES);
>  }
> 
> -static void build_inv_iotlb_pasid(struct iommu_cmd *cmd, u16 devid, int pasid,
> +static void build_inv_iotlb_pasid(struct iommu_cmd *cmd, u16 devid, u32 pasid,
>  				  int qdep, u64 address, bool size)
>  {
>  	memset(cmd, 0, sizeof(*cmd));
> @@ -936,7 +937,7 @@ static void build_inv_iotlb_pasid(struct iommu_cmd *cmd,
> u16 devid, int pasid,
>  	CMD_SET_TYPE(cmd, CMD_INV_IOTLB_PAGES);
>  }
> 
> -static void build_complete_ppr(struct iommu_cmd *cmd, u16 devid, int pasid,
> +static void build_complete_ppr(struct iommu_cmd *cmd, u16 devid, u32 pasid,
>  			       int status, int tag, bool gn)
>  {
>  	memset(cmd, 0, sizeof(*cmd));
> @@ -2772,7 +2773,7 @@ int amd_iommu_domain_enable_v2(struct
> iommu_domain *dom, int pasids)
>  }
>  EXPORT_SYMBOL(amd_iommu_domain_enable_v2);
> 
> -static int __flush_pasid(struct protection_domain *domain, int pasid,
> +static int __flush_pasid(struct protection_domain *domain, u32 pasid,
>  			 u64 address, bool size)
>  {
>  	struct iommu_dev_data *dev_data;
> @@ -2833,13 +2834,13 @@ static int __flush_pasid(struct protection_domain
> *domain, int pasid,
>  	return ret;
>  }
> 
> -static int __amd_iommu_flush_page(struct protection_domain *domain, int pasid,
> +static int __amd_iommu_flush_page(struct protection_domain *domain, u32 pasid,
>  				  u64 address)
>  {
>  	return __flush_pasid(domain, pasid, address, false);
>  }
> 
> -int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
> +int amd_iommu_flush_page(struct iommu_domain *dom, u32 pasid,
>  			 u64 address)
>  {
>  	struct protection_domain *domain = to_pdomain(dom);
> @@ -2854,13 +2855,13 @@ int amd_iommu_flush_page(struct iommu_domain
> *dom, int pasid,
>  }
>  EXPORT_SYMBOL(amd_iommu_flush_page);
> 
> -static int __amd_iommu_flush_tlb(struct protection_domain *domain, int pasid)
> +static int __amd_iommu_flush_tlb(struct protection_domain *domain, u32 pasid)
>  {
>  	return __flush_pasid(domain, pasid,
> CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
>  			     true);
>  }
> 
> -int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid)
> +int amd_iommu_flush_tlb(struct iommu_domain *dom, u32 pasid)
>  {
>  	struct protection_domain *domain = to_pdomain(dom);
>  	unsigned long flags;
> @@ -2874,7 +2875,7 @@ int amd_iommu_flush_tlb(struct iommu_domain *dom,
> int pasid)
>  }
>  EXPORT_SYMBOL(amd_iommu_flush_tlb);
> 
> -static u64 *__get_gcr3_pte(u64 *root, int level, int pasid, bool alloc)
> +static u64 *__get_gcr3_pte(u64 *root, int level, u32 pasid, bool alloc)
>  {
>  	int index;
>  	u64 *pte;
> @@ -2906,7 +2907,7 @@ static u64 *__get_gcr3_pte(u64 *root, int level, int pasid,
> bool alloc)
>  	return pte;
>  }
> 
> -static int __set_gcr3(struct protection_domain *domain, int pasid,
> +static int __set_gcr3(struct protection_domain *domain, u32 pasid,
>  		      unsigned long cr3)
>  {
>  	struct domain_pgtable pgtable;
> @@ -2925,7 +2926,7 @@ static int __set_gcr3(struct protection_domain *domain,
> int pasid,
>  	return __amd_iommu_flush_tlb(domain, pasid);
>  }
> 
> -static int __clear_gcr3(struct protection_domain *domain, int pasid)
> +static int __clear_gcr3(struct protection_domain *domain, u32 pasid)
>  {
>  	struct domain_pgtable pgtable;
>  	u64 *pte;
> @@ -2943,7 +2944,7 @@ static int __clear_gcr3(struct protection_domain *domain,
> int pasid)
>  	return __amd_iommu_flush_tlb(domain, pasid);
>  }
> 
> -int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
> +int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, u32 pasid,
>  			      unsigned long cr3)
>  {
>  	struct protection_domain *domain = to_pdomain(dom);
> @@ -2958,7 +2959,7 @@ int amd_iommu_domain_set_gcr3(struct iommu_domain
> *dom, int pasid,
>  }
>  EXPORT_SYMBOL(amd_iommu_domain_set_gcr3);
> 
> -int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid)
> +int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, u32 pasid)
>  {
>  	struct protection_domain *domain = to_pdomain(dom);
>  	unsigned long flags;
> @@ -2972,7 +2973,7 @@ int amd_iommu_domain_clear_gcr3(struct
> iommu_domain *dom, int pasid)
>  }
>  EXPORT_SYMBOL(amd_iommu_domain_clear_gcr3);
> 
> -int amd_iommu_complete_ppr(struct pci_dev *pdev, int pasid,
> +int amd_iommu_complete_ppr(struct pci_dev *pdev, u32 pasid,
>  			   int status, int tag)
>  {
>  	struct iommu_dev_data *dev_data;
> diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
> index e4b025c5637c..a769985ff26e 100644
> --- a/drivers/iommu/amd/iommu_v2.c
> +++ b/drivers/iommu/amd/iommu_v2.c
> @@ -40,7 +40,7 @@ struct pasid_state {
>  	struct mmu_notifier mn;                 /* mmu_notifier handle */
>  	struct pri_queue pri[PRI_QUEUE_SIZE];	/* PRI tag states */
>  	struct device_state *device_state;	/* Link to our device_state */
> -	int pasid;				/* PASID index */
> +	u32 pasid;				/* PASID index */
>  	bool invalid;				/* Used during setup and
>  						   teardown of the pasid */
>  	spinlock_t lock;			/* Protect pri_queues and
> @@ -70,7 +70,7 @@ struct fault {
>  	struct mm_struct *mm;
>  	u64 address;
>  	u16 devid;
> -	u16 pasid;
> +	u32 pasid;
>  	u16 tag;
>  	u16 finish;
>  	u16 flags;
> @@ -150,7 +150,7 @@ static void put_device_state(struct device_state *dev_state)
> 
>  /* Must be called under dev_state->lock */
>  static struct pasid_state **__get_pasid_state_ptr(struct device_state *dev_state,
> -						  int pasid, bool alloc)
> +						  u32 pasid, bool alloc)
>  {
>  	struct pasid_state **root, **ptr;
>  	int level, index;
> @@ -184,7 +184,7 @@ static struct pasid_state **__get_pasid_state_ptr(struct
> device_state *dev_state
> 
>  static int set_pasid_state(struct device_state *dev_state,
>  			   struct pasid_state *pasid_state,
> -			   int pasid)
> +			   u32 pasid)
>  {
>  	struct pasid_state **ptr;
>  	unsigned long flags;
> @@ -211,7 +211,7 @@ static int set_pasid_state(struct device_state *dev_state,
>  	return ret;
>  }
> 
> -static void clear_pasid_state(struct device_state *dev_state, int pasid)
> +static void clear_pasid_state(struct device_state *dev_state, u32 pasid)
>  {
>  	struct pasid_state **ptr;
>  	unsigned long flags;
> @@ -229,7 +229,7 @@ static void clear_pasid_state(struct device_state *dev_state,
> int pasid)
>  }
> 
>  static struct pasid_state *get_pasid_state(struct device_state *dev_state,
> -					   int pasid)
> +					   u32 pasid)
>  {
>  	struct pasid_state **ptr, *ret = NULL;
>  	unsigned long flags;
> @@ -594,7 +594,7 @@ static struct notifier_block ppr_nb = {
>  	.notifier_call = ppr_notifier,
>  };
> 
> -int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
> +int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
>  			 struct task_struct *task)
>  {
>  	struct pasid_state *pasid_state;
> @@ -615,7 +615,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
>  		return -EINVAL;
> 
>  	ret = -EINVAL;
> -	if (pasid < 0 || pasid >= dev_state->max_pasids)
> +	if (pasid >= dev_state->max_pasids)
>  		goto out;
> 
>  	ret = -ENOMEM;
> @@ -679,7 +679,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
>  }
>  EXPORT_SYMBOL(amd_iommu_bind_pasid);
> 
> -void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid)
> +void amd_iommu_unbind_pasid(struct pci_dev *pdev, u32 pasid)
>  {
>  	struct pasid_state *pasid_state;
>  	struct device_state *dev_state;
> @@ -695,7 +695,7 @@ void amd_iommu_unbind_pasid(struct pci_dev *pdev, int
> pasid)
>  	if (dev_state == NULL)
>  		return;
> 
> -	if (pasid < 0 || pasid >= dev_state->max_pasids)
> +	if (pasid >= dev_state->max_pasids)
>  		goto out;
> 
>  	pasid_state = get_pasid_state(dev_state, pasid);
> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
> index 683b812c5c47..2b75a1561377 100644
> --- a/drivers/iommu/intel/dmar.c
> +++ b/drivers/iommu/intel/dmar.c
> @@ -1466,7 +1466,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
> u16 sid, u16 pfsid,
>  }
> 
>  void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
> -			  u64 granu, int pasid)
> +			  u64 granu, u32 pasid)
>  {
>  	struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> 
> @@ -1780,7 +1780,7 @@ void dmar_msi_read(int irq, struct msi_msg *msg)
>  }
> 
>  static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
> -		u8 fault_reason, int pasid, u16 source_id,
> +		u8 fault_reason, u32 pasid, u16 source_id,
>  		unsigned long long addr)
>  {
>  	const char *reason;
> @@ -1830,7 +1830,8 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
>  		u8 fault_reason;
>  		u16 source_id;
>  		u64 guest_addr;
> -		int type, pasid;
> +		u32 pasid;
> +		int type;
>  		u32 data;
>  		bool pasid_present;
> 
> diff --git a/drivers/iommu/intel/intel-pasid.h b/drivers/iommu/intel/intel-pasid.h
> index c5318d40e0fa..bb5137183145 100644
> --- a/drivers/iommu/intel/intel-pasid.h
> +++ b/drivers/iommu/intel/intel-pasid.h
> @@ -72,7 +72,7 @@ struct pasid_entry {
>  struct pasid_table {
>  	void			*table;		/* pasid table pointer */
>  	int			order;		/* page order of pasid table */
> -	int			max_pasid;	/* max pasid */
> +	u32			max_pasid;	/* max pasid */
>  	struct list_head	dev;		/* device list */
>  };
> 
> @@ -98,31 +98,31 @@ static inline bool pasid_pte_is_present(struct pasid_entry
> *pte)
>  	return READ_ONCE(pte->val[0]) & PASID_PTE_PRESENT;
>  }
> 
> -extern u32 intel_pasid_max_id;
> +extern unsigned int intel_pasid_max_id;
>  int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp);
> -void intel_pasid_free_id(int pasid);
> -void *intel_pasid_lookup_id(int pasid);
> +void intel_pasid_free_id(u32 pasid);
> +void *intel_pasid_lookup_id(u32 pasid);
>  int intel_pasid_alloc_table(struct device *dev);
>  void intel_pasid_free_table(struct device *dev);
>  struct pasid_table *intel_pasid_get_table(struct device *dev);
>  int intel_pasid_get_dev_max_id(struct device *dev);
> -struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
> +struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid);
>  int intel_pasid_setup_first_level(struct intel_iommu *iommu,
>  				  struct device *dev, pgd_t *pgd,
> -				  int pasid, u16 did, int flags);
> +				  u32 pasid, u16 did, int flags);
>  int intel_pasid_setup_second_level(struct intel_iommu *iommu,
>  				   struct dmar_domain *domain,
> -				   struct device *dev, int pasid);
> +				   struct device *dev, u32 pasid);
>  int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>  				   struct dmar_domain *domain,
> -				   struct device *dev, int pasid);
> +				   struct device *dev, u32 pasid);
>  int intel_pasid_setup_nested(struct intel_iommu *iommu,
> -			     struct device *dev, pgd_t *pgd, int pasid,
> +			     struct device *dev, pgd_t *pgd, u32 pasid,
>  			     struct iommu_gpasid_bind_data_vtd *pasid_data,
>  			     struct dmar_domain *domain, int addr_width);
>  void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> -				 struct device *dev, int pasid,
> +				 struct device *dev, u32 pasid,
>  				 bool fault_ignore);
> -int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
> -void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid);
> +void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid);
>  #endif /* __INTEL_PASID_H */
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index d759e7234e98..7d2dbc327c62 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2475,7 +2475,7 @@ dmar_search_domain_by_dev_info(int segment, int bus,
> int devfn)
>  static int domain_setup_first_level(struct intel_iommu *iommu,
>  				    struct dmar_domain *domain,
>  				    struct device *dev,
> -				    int pasid)
> +				    u32 pasid)
>  {
>  	int flags = PASID_FLAG_SUPERVISOR_MODE;
>  	struct dma_pte *pgd = domain->pgd;
> @@ -5155,7 +5155,7 @@ static int aux_domain_add_dev(struct dmar_domain
> *domain,
>  		return -ENODEV;
> 
>  	if (domain->default_pasid <= 0) {
> -		int pasid;
> +		u32 pasid;
> 
>  		/* No private data needed for the default pasid */
>  		pasid = ioasid_alloc(NULL, PASID_MIN,
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index c81f0f17c6ba..d8637c020281 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -27,7 +27,7 @@
>  static DEFINE_SPINLOCK(pasid_lock);
>  u32 intel_pasid_max_id = PASID_MAX;
> 
> -int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, u32 *pasid)
>  {
>  	unsigned long flags;
>  	u8 status_code;
> @@ -58,7 +58,7 @@ int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> *pasid)
>  	return ret;
>  }
> 
> -void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> +void vcmd_free_pasid(struct intel_iommu *iommu, u32 pasid)
>  {
>  	unsigned long flags;
>  	u8 status_code;
> @@ -146,7 +146,7 @@ int intel_pasid_alloc_table(struct device *dev)
>  	struct pasid_table *pasid_table;
>  	struct pasid_table_opaque data;
>  	struct page *pages;
> -	int max_pasid = 0;
> +	u32 max_pasid = 0;
>  	int ret, order;
>  	int size;
> 
> @@ -168,7 +168,7 @@ int intel_pasid_alloc_table(struct device *dev)
>  	INIT_LIST_HEAD(&pasid_table->dev);
> 
>  	if (info->pasid_supported)
> -		max_pasid = min_t(int, pci_max_pasids(to_pci_dev(dev)),
> +		max_pasid = min_t(u32, pci_max_pasids(to_pci_dev(dev)),
>  				  intel_pasid_max_id);
> 
>  	size = max_pasid >> (PASID_PDE_SHIFT - 3);
> @@ -242,7 +242,7 @@ int intel_pasid_get_dev_max_id(struct device *dev)
>  	return info->pasid_table->max_pasid;
>  }
> 
> -struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
> +struct pasid_entry *intel_pasid_get_entry(struct device *dev, u32 pasid)
>  {
>  	struct device_domain_info *info;
>  	struct pasid_table *pasid_table;
> @@ -251,8 +251,7 @@ struct pasid_entry *intel_pasid_get_entry(struct device *dev,
> int pasid)
>  	int dir_index, index;
> 
>  	pasid_table = intel_pasid_get_table(dev);
> -	if (WARN_ON(!pasid_table || pasid < 0 ||
> -		    pasid >= intel_pasid_get_dev_max_id(dev)))
> +	if (WARN_ON(!pasid_table || pasid >= intel_pasid_get_dev_max_id(dev)))
>  		return NULL;
> 
>  	dir = pasid_table->table;
> @@ -305,7 +304,7 @@ static inline void pasid_clear_entry_with_fpd(struct
> pasid_entry *pe)
>  }
> 
>  static void
> -intel_pasid_clear_entry(struct device *dev, int pasid, bool fault_ignore)
> +intel_pasid_clear_entry(struct device *dev, u32 pasid, bool fault_ignore)
>  {
>  	struct pasid_entry *pe;
> 
> @@ -444,7 +443,7 @@ pasid_set_eafe(struct pasid_entry *pe)
> 
>  static void
>  pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
> -				    u16 did, int pasid)
> +				    u16 did, u32 pasid)
>  {
>  	struct qi_desc desc;
> 
> @@ -473,7 +472,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu *iommu,
> u16 did, u32 pasid)
> 
>  static void
>  devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
> -			       struct device *dev, int pasid)
> +			       struct device *dev, u32 pasid)
>  {
>  	struct device_domain_info *info;
>  	u16 sid, qdep, pfsid;
> @@ -490,7 +489,7 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>  }
> 
>  void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
> -				 int pasid, bool fault_ignore)
> +				 u32 pasid, bool fault_ignore)
>  {
>  	struct pasid_entry *pte;
>  	u16 did;
> @@ -515,7 +514,7 @@ void intel_pasid_tear_down_entry(struct intel_iommu
> *iommu, struct device *dev,
> 
>  static void pasid_flush_caches(struct intel_iommu *iommu,
>  				struct pasid_entry *pte,
> -				int pasid, u16 did)
> +			       u32 pasid, u16 did)
>  {
>  	if (!ecap_coherent(iommu->ecap))
>  		clflush_cache_range(pte, sizeof(*pte));
> @@ -534,7 +533,7 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
>   */
>  int intel_pasid_setup_first_level(struct intel_iommu *iommu,
>  				  struct device *dev, pgd_t *pgd,
> -				  int pasid, u16 did, int flags)
> +				  u32 pasid, u16 did, int flags)
>  {
>  	struct pasid_entry *pte;
> 
> @@ -607,7 +606,7 @@ static inline int iommu_skip_agaw(struct dmar_domain
> *domain,
>   */
>  int intel_pasid_setup_second_level(struct intel_iommu *iommu,
>  				   struct dmar_domain *domain,
> -				   struct device *dev, int pasid)
> +				   struct device *dev, u32 pasid)
>  {
>  	struct pasid_entry *pte;
>  	struct dma_pte *pgd;
> @@ -665,7 +664,7 @@ int intel_pasid_setup_second_level(struct intel_iommu
> *iommu,
>   */
>  int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>  				   struct dmar_domain *domain,
> -				   struct device *dev, int pasid)
> +				   struct device *dev, u32 pasid)
>  {
>  	u16 did = FLPT_DEFAULT_DID;
>  	struct pasid_entry *pte;
> @@ -751,7 +750,7 @@ intel_pasid_setup_bind_data(struct intel_iommu *iommu,
> struct pasid_entry *pte,
>   * @addr_width: Address width of the first level (guest)
>   */
>  int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
> -			     pgd_t *gpgd, int pasid,
> +			     pgd_t *gpgd, u32 pasid,
>  			     struct iommu_gpasid_bind_data_vtd *pasid_data,
>  			     struct dmar_domain *domain, int addr_width)
>  {
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 6c87c807a0ab..778089d198eb 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -23,7 +23,7 @@
>  #include "intel-pasid.h"
> 
>  static irqreturn_t prq_event_thread(int irq, void *d);
> -static void intel_svm_drain_prq(struct device *dev, int pasid);
> +static void intel_svm_drain_prq(struct device *dev, u32 pasid);
> 
>  #define PRQ_ORDER 0
> 
> @@ -371,7 +371,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain,
> struct device *dev,
>  	return ret;
>  }
> 
> -int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> +int intel_svm_unbind_gpasid(struct device *dev, u32 pasid)
>  {
>  	struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
>  	struct intel_svm_dev *sdev;
> @@ -601,7 +601,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct
> svm_dev_ops *ops,
>  }
> 
>  /* Caller must hold pasid_mutex */
> -static int intel_svm_unbind_mm(struct device *dev, int pasid)
> +static int intel_svm_unbind_mm(struct device *dev, u32 pasid)
>  {
>  	struct intel_svm_dev *sdev;
>  	struct intel_iommu *iommu;
> @@ -728,7 +728,7 @@ static bool is_canonical_address(u64 addr)
>   * described in VT-d spec CH7.10 to drain all page requests and page
>   * responses pending in the hardware.
>   */
> -static void intel_svm_drain_prq(struct device *dev, int pasid)
> +static void intel_svm_drain_prq(struct device *dev, u32 pasid)
>  {
>  	struct device_domain_info *info;
>  	struct dmar_domain *domain;
> @@ -988,10 +988,10 @@ void intel_svm_unbind(struct iommu_sva *sva)
>  	mutex_unlock(&pasid_mutex);
>  }
> 
> -int intel_svm_get_pasid(struct iommu_sva *sva)
> +u32 intel_svm_get_pasid(struct iommu_sva *sva)
>  {
>  	struct intel_svm_dev *sdev;
> -	int pasid;
> +	u32 pasid;
> 
>  	mutex_lock(&pasid_mutex);
>  	sdev = to_intel_svm_dev(sva);
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d43120eb1dc5..56dbabdd7f5f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2828,7 +2828,7 @@ void iommu_sva_unbind_device(struct iommu_sva
> *handle)
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
> 
> -int iommu_sva_get_pasid(struct iommu_sva *handle)
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle)
>  {
>  	const struct iommu_ops *ops = handle->dev->bus->iommu_ops;
> 
> diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> index 107028e77ca3..5cc6ebccba65 100644
> --- a/drivers/misc/uacce/uacce.c
> +++ b/drivers/misc/uacce/uacce.c
> @@ -92,7 +92,7 @@ static long uacce_fops_compat_ioctl(struct file *filep,
> 
>  static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
>  {
> -	int pasid;
> +	u32 pasid;
>  	struct iommu_sva *handle;
> 
>  	if (!(uacce->flags & UACCE_DEV_SVA))
> diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
> index 21e950e4ab62..450717299928 100644
> --- a/include/linux/amd-iommu.h
> +++ b/include/linux/amd-iommu.h
> @@ -76,7 +76,7 @@ extern void amd_iommu_free_device(struct pci_dev *pdev);
>   *
>   * The function returns 0 on success or a negative value on error.
>   */
> -extern int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
> +extern int amd_iommu_bind_pasid(struct pci_dev *pdev, u32 pasid,
>  				struct task_struct *task);
> 
>  /**
> @@ -88,7 +88,7 @@ extern int amd_iommu_bind_pasid(struct pci_dev *pdev, int
> pasid,
>   * When this function returns the device is no longer using the PASID
>   * and the PASID is no longer bound to its task.
>   */
> -extern void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid);
> +extern void amd_iommu_unbind_pasid(struct pci_dev *pdev, u32 pasid);
> 
>  /**
>   * amd_iommu_set_invalid_ppr_cb() - Register a call-back for failed
> @@ -114,7 +114,7 @@ extern void amd_iommu_unbind_pasid(struct pci_dev *pdev,
> int pasid);
>  #define AMD_IOMMU_INV_PRI_RSP_FAIL	2
> 
>  typedef int (*amd_iommu_invalid_ppr_cb)(struct pci_dev *pdev,
> -					int pasid,
> +					u32 pasid,
>  					unsigned long address,
>  					u16);
> 
> @@ -166,7 +166,7 @@ extern int amd_iommu_device_info(struct pci_dev *pdev,
>   * @cb: The call-back function
>   */
> 
> -typedef void (*amd_iommu_invalidate_ctx)(struct pci_dev *pdev, int pasid);
> +typedef void (*amd_iommu_invalidate_ctx)(struct pci_dev *pdev, u32 pasid);
> 
>  extern int amd_iommu_set_invalidate_ctx_cb(struct pci_dev *pdev,
>  					   amd_iommu_invalidate_ctx cb);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 3e8fa1c7a1e6..643951e28dd4 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -550,7 +550,7 @@ struct dmar_domain {
>  					   2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
>  	u64		max_addr;	/* maximum mapped address */
> 
> -	int		default_pasid;	/*
> +	u32		default_pasid;	/*
>  					 * The default pasid used for non-SVM
>  					 * traffic on mediated devices.
>  					 */
> @@ -707,7 +707,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu,
> u16 sid, u16 pfsid,
>  			      u32 pasid, u16 qdep, u64 addr,
>  			      unsigned int size_order, u64 granu);
>  void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
> -			  int pasid);
> +			  u32 pasid);
> 
>  int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
>  		   unsigned int count, unsigned long options);
> @@ -735,11 +735,11 @@ extern int intel_svm_enable_prq(struct intel_iommu
> *iommu);
>  extern int intel_svm_finish_prq(struct intel_iommu *iommu);
>  int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
>  			  struct iommu_gpasid_bind_data *data);
> -int intel_svm_unbind_gpasid(struct device *dev, int pasid);
> +int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
>  struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
>  				 void *drvdata);
>  void intel_svm_unbind(struct iommu_sva *handle);
> -int intel_svm_get_pasid(struct iommu_sva *handle);
> +u32 intel_svm_get_pasid(struct iommu_sva *handle);
>  struct svm_dev_ops;
> 
>  struct intel_svm_dev {
> @@ -748,7 +748,7 @@ struct intel_svm_dev {
>  	struct device *dev;
>  	struct svm_dev_ops *ops;
>  	struct iommu_sva sva;
> -	int pasid;
> +	u32 pasid;
>  	int users;
>  	u16 did;
>  	u16 dev_iotlb:1;
> @@ -761,7 +761,7 @@ struct intel_svm {
> 
>  	struct intel_iommu *iommu;
>  	int flags;
> -	int pasid;
> +	u32 pasid;
>  	int gpasid; /* In case that guest PASID is different from host PASID */
>  	struct list_head devs;
>  	struct list_head list;
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index c9e7e601950d..39d368a810b8 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -11,7 +11,7 @@
>  struct device;
> 
>  struct svm_dev_ops {
> -	void (*fault_cb)(struct device *dev, int pasid, u64 address,
> +	void (*fault_cb)(struct device *dev, u32 pasid, u64 address,
>  			 void *private, int rwxp, int response);
>  };
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 5f0b7859d2eb..9188ec5b39c5 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -292,7 +292,7 @@ struct iommu_ops {
>  	struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
>  				      void *drvdata);
>  	void (*sva_unbind)(struct iommu_sva *handle);
> -	int (*sva_get_pasid)(struct iommu_sva *handle);
> +	u32 (*sva_get_pasid)(struct iommu_sva *handle);
> 
>  	int (*page_response)(struct device *dev,
>  			     struct iommu_fault_event *evt,
> @@ -302,7 +302,7 @@ struct iommu_ops {
>  	int (*sva_bind_gpasid)(struct iommu_domain *domain,
>  			struct device *dev, struct iommu_gpasid_bind_data *data);
> 
> -	int (*sva_unbind_gpasid)(struct device *dev, int pasid);
> +	int (*sva_unbind_gpasid)(struct device *dev, u32 pasid);
> 
>  	int (*def_domain_type)(struct device *dev);
> 
> @@ -656,7 +656,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device
> *dev,
>  					struct mm_struct *mm,
>  					void *drvdata);
>  void iommu_sva_unbind_device(struct iommu_sva *handle);
> -int iommu_sva_get_pasid(struct iommu_sva *handle);
> +u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> 
>  #else /* CONFIG_IOMMU_API */
> 
> @@ -1049,7 +1049,7 @@ static inline void iommu_sva_unbind_device(struct
> iommu_sva *handle)
>  {
>  }
> 
> -static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
> +static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
>  {
>  	return IOMMU_PASID_INVALID;
>  }
> @@ -1068,7 +1068,7 @@ static inline int iommu_sva_bind_gpasid(struct
> iommu_domain *domain,
>  }
> 
>  static inline int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
> -					   struct device *dev, int pasid)
> +					   struct device *dev, u32 pasid)
>  {
>  	return -ENODEV;
>  }
> diff --git a/include/linux/uacce.h b/include/linux/uacce.h
> index 454c2f6672d7..48e319f40275 100644
> --- a/include/linux/uacce.h
> +++ b/include/linux/uacce.h
> @@ -81,7 +81,7 @@ struct uacce_queue {
>  	struct list_head list;
>  	struct uacce_qfile_region *qfrs[UACCE_MAX_REGION];
>  	enum uacce_q_state state;
> -	int pasid;
> +	u32 pasid;
>  	struct iommu_sva *handle;
>  };
> 
> --
> 2.19.1
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
  2020-07-13 23:47 ` [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing) Fenghua Yu
@ 2020-07-14  3:25   ` Liu, Yi L
  2020-07-15 23:32     ` Fenghua Yu
  0 siblings, 1 reply; 36+ messages in thread
From: Liu, Yi L @ 2020-07-14  3:25 UTC (permalink / raw)
  To: Yu, Fenghua, Thomas Gleixner, Joerg Roedel, Ingo Molnar,
	Borislav Petkov, Peter Zijlstra, H Peter Anvin, David Woodhouse,
	Lu Baolu, Felix Kuehling, Hansen, Dave, Luck, Tony,
	Jean-Philippe Brucker, Christoph Hellwig, Raj, Ashok, Pan,
	Jacob jun, Jiang, Dave, Mehta, Sohil, Shankar, Ravi V
  Cc: Yu, Fenghua, iommu, x86, linux-kernel, amd-gfx

> From: Fenghua Yu <fenghua.yu@intel.com>
> Sent: Tuesday, July 14, 2020 7:48 AM
> 
> From: Ashok Raj <ashok.raj@intel.com>
> 
> ENQCMD and Data Streaming Accelerator (DSA) and all of their associated features
> are a complicated stack with lots of interconnected pieces.
> This documentation provides a big picture overview for all of the features.
> 
> Signed-off-by: Ashok Raj <ashok.raj@intel.com>
> Co-developed-by: Fenghua Yu <fenghua.yu@intel.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
> v3:
> - Replace deprecated intel_svm_bind_mm() by iommu_sva_bind_mm() (Baolu)
> - Fix a couple of typos (Baolu)
> 
> v2:
> - Fix the doc format and add the doc in toctree (Thomas)
> - Modify the doc for better description (Thomas, Tony, Dave)
> 
>  Documentation/x86/index.rst |   1 +
>  Documentation/x86/sva.rst   | 287 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 288 insertions(+)
>  create mode 100644 Documentation/x86/sva.rst
> 
> diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index
> 265d9e9a093b..e5d5ff096685 100644
> --- a/Documentation/x86/index.rst
> +++ b/Documentation/x86/index.rst
> @@ -30,3 +30,4 @@ x86-specific Documentation
>     usb-legacy-support
>     i386/index
>     x86_64/index
> +   sva
> diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst new file mode
> 100644 index 000000000000..7242a84169ef
> --- /dev/null
> +++ b/Documentation/x86/sva.rst
> @@ -0,0 +1,287 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===========================================
> +Shared Virtual Addressing (SVA) with ENQCMD
> +===========================================
> +
> +Background
> +==========
> +
> +Shared Virtual Addressing (SVA) allows the processor and device to use
> +the same virtual addresses avoiding the need for software to translate
> +virtual addresses to physical addresses. SVA is what PCIe calls Shared
> +Virtual Memory (SVM)
> +
> +In addition to the convenience of using application virtual addresses
> +by the device, it also doesn't require pinning pages for DMA.
> +PCIe Address Translation Services (ATS) along with Page Request
> +Interface
> +(PRI) allow devices to function much the same way as the CPU handling
> +application page-faults. For more information please refer to PCIe
> +specification Chapter 10: ATS Specification.
> +

nit: may be helpful to mention Chapter 10 of PCIe spec since 4.0. before that, ATS has its
own specification.

> +Use of SVA requires IOMMU support in the platform. IOMMU also is
> +required to support PCIe features ATS and PRI. ATS allows devices to
> +cache translations for the virtual address. IOMMU driver uses the
> +mmu_notifier() support to keep the device tlb cache and the CPU cache
> +in sync. PRI allows the device to request paging the virtual address
> +before using if they are not paged in the CPU page tables.
> +
> +
> +Shared Hardware Workqueues
> +==========================
> +
> +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV)
> +permits the use of Shared Work Queues (SWQ) by both applications and
> +Virtual Machines (VM's). This allows better hardware utilization vs.
> +hard partitioning resources that could result in under utilization. In
> +order to allow the hardware to distinguish the context for which work
> +is being executed in the hardware by SWQ interface, SIOV uses Process
> +Address Space ID (PASID), which is a 20bit number defined by the PCIe SIG.
> +
> +PASID value is encoded in all transactions from the device. This allows
> +the IOMMU to track I/O on a per-PASID granularity in addition to using
> +the PCIe Resource Identifier (RID) which is the Bus/Device/Function.
> +
> +
> +ENQCMD
> +======
> +
> +ENQCMD is a new instruction on Intel platforms that atomically submits
> +a work descriptor to a device. The descriptor includes the operation to
> +be performed, virtual addresses of all parameters, virtual address of a
> +completion record, and the PASID (process address space ID) of the current process.
> +
> +ENQCMD works with non-posted semantics and carries a status back if the
> +command was accepted by hardware. This allows the submitter to know if
> +the submission needs to be retried or other device specific mechanisms
> +to implement fairness or ensure forward progress can be made.
> +
> +ENQCMD is the glue that ensures applications can directly submit
> +commands to the hardware and also permit hardware to be aware of
> +application context to perform I/O operations via use of PASID.
> +

maybe a reader will ask about ENQCMDs after reading ENQCMD/S spec. :-)

> +Process Address Space Tagging
> +=============================
> +
> +A new thread scoped MSR (IA32_PASID) provides the connection between
> +user processes and the rest of the hardware. When an application first
> +accesses an SVA capable device this MSR is initialized with a newly
> +allocated PASID. The driver for the device calls an IOMMU specific api
> +that sets up the routing for DMA and page-requests.
> +
> +For example, the Intel Data Streaming Accelerator (DSA) uses
> +iommu_sva_bind_device(), which will do the following.
> +
> +- Allocate the PASID, and program the process page-table (cr3) in the
> +PASID
> +  context entries.

nit: s/PASID context entries/PASID table entries/

> +- Register for mmu_notifier() to track any page-table invalidations to
> +keep
> +  the device tlb in sync. For example, when a page-table entry is

not only device tlb. I guess iotlb is also included.

> +invalidated,
> +  IOMMU propagates the invalidation to device tlb. This will force any
> +  future access by the device to this virtual address to participate in
> +  ATS. If the IOMMU responds with proper response that a page is not
> +  present, the device would request the page to be paged in via the
> +PCIe PRI
> +  protocol before performing I/O.
> +
> +This MSR is managed with the XSAVE feature set as "supervisor state" to
> +ensure the MSR is updated during context switch.
> +
> +PASID Management
> +================
> +
> +The kernel must allocate a PASID on behalf of each process and program
> +it into the new MSR to communicate the process identity to platform hardware.
> +ENQCMD uses the PASID stored in this MSR to tag requests from this process.
> +When a user submits a work descriptor to a device using the ENQCMD
> +instruction, the PASID field in the descriptor is auto-filled with the
> +value from MSR_IA32_PASID. Requests for DMA from the device are also
> +tagged with the same PASID. The platform IOMMU uses the PASID in the

not quite get " Requests for DMA from the device are also tagged with the same PASID"

> +transaction to perform address translation. The IOMMU api's setup the

s/api's/apis/ ?

> +corresponding PASID entry in IOMMU with the process address used by the CPU
> (for e.g cr3 in x86).

with the process page tables used by the CPU (e.g. the page tables pointed by cr3 in x86).

> +
> +The MSR must be configured on each logical CPU before any application

s/MSR/MSR_IA32_PASID/

> +thread can interact with a device. Threads that belong to the same
> +process share the same page tables, thus the same MSR value.

s/MSR/PASID/

> +
> +PASID is cleared when a process is created. The PASID allocation and

s/PASID/MSR_IA32_PASID/

> +MSR programming may occur long after a process and its threads have been
> created.
> +One thread must call bind() to allocate the PASID for the process. If a

s/bind()/iommu_sva_bind_device()/ or say "call iommu api to bind a process with
a device." :-)

> +thread uses ENQCMD without the MSR first being populated, it will cause #GP.
> +The kernel will fix up the #GP by writing the process-wide PASID into
> +the thread that took the #GP. A single process PASID can be used
> +simultaneously with multiple devices since they all share the same address space.

simultaneously with multiple devices if they all share the process address space.

> +
> +New threads could inherit the MSR value from the parent. But this would

s/MSR/MSR_IA32_PASID/

> +involve additional state management for those threads which may never
> +use ENQCMD. Clearing the MSR at thread creation permits all threads to
> +have a consistent behavior; the PASID is only programmed when the
> +thread calls the bind() api (iommu_sva_bind_device()()), or when a
> +thread calls ENQCMD for the first time.
> +
> +PASID Lifecycle Management
> +==========================
> +
> +Only processes that access SVA capable devices need to have a PASID
> +allocated. This allocation happens when a process first opens an SVA
> +capable device (subsequent opens of the same, or other devices will
> +share the same PASID).
> +
> +Although the PASID is allocated to the process by opening a device, it
> +is not active in any of the threads of that process. Activation is done
> +lazily when a thread tries to submit a work descriptor to a device
> +using the ENQCMD.
> +
> +That first access will trigger a #GP fault because the IA32_PASID MSR
> +has not been initialized with the PASID value assigned to the process
> +when the device was opened. The Linux #GP handler notes that a PASID as
> +been allocated for the process, and so initializes the IA32_PASID MSR
> +and returns so that the ENQCMD instruction is re-executed.
> +
> +On fork(2) or exec(2) the PASID is removed from the process as it no
> +longer has the same address space that it had when the device was opened.
> +
> +On clone(2) the new task shares the same address space, so will be able
> +to use the PASID allocated to the process. The IA32_PASID is not
> +preemptively initialized as the kernel does not know whether this
> +thread is going to access the device.
> +
> +On exit(2) the PASID is freed. The device driver ensures that any
> +pending operations queued to the device are either completed or aborted
> +before allowing the PASID to be reallocated.
> +
> +Relationships
> +=============
> +
> + * Each process has many threads, but only one PASID
> + * Devices have a limited number (~10's to 1000's) of hardware
> +   workqueues and each portal maps down to a single workqueue.
> +   The device driver manages allocating hardware workqueues.
> + * A single mmap() maps a single hardware workqueue as a "portal"
> + * For each device with which a process interacts, there must be
> +   one or more mmap()'d portals.
> + * Many threads within a process can share a single portal to access
> +   a single device.
> + * Multiple processes can separately mmap() the same portal, in
> +   which case they still share one device hardware workqueue.
> + * The single process-wide PASID is used by all threads to interact
> +   with all devices.  There is not, for instance, a PASID for each

s/with all devices/with all devices manipulated by the process/

Regards,
Yi Liu

> +   thread or each thread<->device pair.
> +
> +FAQ
> +===
> +
> +* What is SVA/SVM?
> +
> +Shared Virtual Addressing (SVA) permits I/O hardware and the processor
> +to work in the same address space. In short, sharing the address space.
> +Some call it Shared Virtual Memory (SVM), but Linux community wanted to
> +avoid it with Posix Shared Memory and Secure Virtual Machines which
> +were terms already in circulation.
> +
> +* What is a PASID?
> +
> +A Process Address Space ID (PASID) is a PCIe-defined TLP Prefix. A
> +PASID is a 20 bit number allocated and managed by the OS. PASID is
> +included in all transactions between the platform and the device.
> +
> +* How are shared work queues different?
> +
> +Traditionally to allow user space applications interact with hardware,
> +there is a separate instance required per process. For example,
> +consider doorbells as a mechanism of informing hardware about work to
> +process. Each doorbell is required to be spaced 4k (or page-size) apart
> +for process isolation. This requires hardware to provision that space
> +and reserve in MMIO. This doesn't scale as the number of threads
> +becomes quite large. The hardware also manages the queue depth for
> +Shared Work Queues (SWQ), and consumers don't need to track queue
> +depth. If there is no space to accept a command, the device will return
> +an error indicating retry. Also submitting a command to an MMIO address
> +that can't accept ENQCMD will return retry in response. In the new DMWr
> +PCIe terminology, devices need to support DMWr completer capability. In
> +addition it requires all switch ports to support DMWr routing and must
> +be enabled by the PCIe subsystem, much like how PCIe Atomics() are managed for
> instance.
> +
> +SWQ allows hardware to provision just a single address in the device.
> +When used with ENQCMD to submit work, the device can distinguish the
> +process submitting the work since it will include the PASID assigned to
> +that process. This decreases the pressure of hardware requiring to
> +support hardware to scale to a large number of processes.
> +
> +* Is this the same as a user space device driver?
> +
> +Communicating with the device via the shared work queue is much simpler
> +than a full blown user space driver. The kernel driver does all the
> +initialization of the hardware. User space only needs to worry about
> +submitting work and processing completions.
> +
> +* Is this the same as SR-IOV?
> +
> +Single Root I/O Virtualization (SR-IOV) focuses on providing
> +independent hardware interfaces for virtualizing hardware. Hence its
> +required to be almost fully functional interface to software supporting
> +the traditional BAR's, space for interrupts via MSI-x, its own register layout.
> +Virtual Functions (VFs) are assisted by the Physical Function (PF)
> +driver.
> +
> +Scalable I/O Virtualization builds on the PASID concept to create
> +device instances for virtualization. SIOV requires host software to
> +assist in creating virtual devices, each virtual device is represented
> +by a PASID along with the BDF of the device.  This allows device
> +hardware to optimize device resource creation and can grow dynamically
> +on demand. SR-IOV creation and management is very static in nature.
> +Consult references below for more details.
> +
> +* Why not just create a virtual function for each app?
> +
> +Creating PCIe SRIOV type virtual functions (VF) are expensive. They
> +create duplicated hardware for PCI config space requirements,
> +Interrupts such as MSIx for instance. Resources such as interrupts have
> +to be hard partitioned between VF's at creation time, and cannot scale
> +dynamically on demand. The VF's are not completely independent from the
> +Physical function (PF). Most VF's require some communication and
> +assistance from the PF driver. SIOV creates a software defined device.
> +Where all the configuration and control aspects are mediated via the
> +slow path. The work submission and completion happen without any mediation.
> +
> +* Does this support virtualization?
> +
> +ENQCMD can be used from within a guest VM. In these cases the VMM helps
> +with setting up a translation table to translate from Guest PASID to
> +Host PASID. Please consult the ENQCMD instruction set reference for
> +more details.
> +
> +* Does memory need to be pinned?
> +
> +When devices support SVA, along with platform hardware such as IOMMU
> +supporting such devices, there is no need to pin memory for DMA purposes.
> +Devices that support SVA also support other PCIe features that remove
> +the pinning requirement for memory.
> +
> +Device TLB support - Device requests the IOMMU to lookup an address
> +before use via Address Translation Service (ATS) requests.  If the
> +mapping exists but there is no page allocated by the OS, IOMMU hardware
> +returns that no mapping exists.
> +
> +Device requests that virtual address to be mapped via Page Request
> +Interface (PRI). Once the OS has successfully completed  the mapping,
> +it returns the response back to the device. The device continues again
> +to request for a translation and continues.
> +
> +IOMMU works with the OS in managing consistency of page-tables with the
> +device. When removing pages, it interacts with the device to remove any
> +device-tlb that might have been cached before removing the mappings
> +from the OS.
> +
> +References
> +==========
> +
> +VT-D:
> +https://01.org/blogs/ashokraj/2018/recent-enhancements-intel-virtualiza
> +tion-technology-directed-i/o-intel-vt-d
> +
> +SIOV:
> +https://01.org/blogs/2019/assignable-interfaces-intel-scalable-i/o-virt
> +ualization-linux
> +
> +ENQCMD in ISE:
> +https://software.intel.com/sites/default/files/managed/c5/15/architectu
> +re-instruction-set-extensions-programming-reference.pdf
> +
> +DSA spec:
> +https://software.intel.com/sites/default/files/341204-intel-data-stream
> +ing-accelerator-spec.pdf
> --
> 2.19.1
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 01/12] iommu: Change type of pasid to u32
  2020-07-14  2:45   ` Liu, Yi L
@ 2020-07-14 13:54     ` Fenghua Yu
  2020-07-14 13:56       ` Liu, Yi L
  0 siblings, 1 reply; 36+ messages in thread
From: Fenghua Yu @ 2020-07-14 13:54 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Peter Zijlstra, Hansen, Dave, H Peter Anvin,
	Jean-Philippe Brucker, Jiang, Dave, Raj, Ashok, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Shankar, Ravi V, Borislav Petkov,
	Thomas Gleixner, Luck, Tony, Felix Kuehling, linux-kernel, iommu,
	Pan, Jacob jun, David Woodhouse

On Mon, Jul 13, 2020 at 07:45:49PM -0700, Liu, Yi L wrote:
> > From: Fenghua Yu <fenghua.yu@intel.com>
> > Sent: Tuesday, July 14, 2020 7:48 AM
> >
> > PASID is defined as a few different types in iommu including "int",
> > "u32", and "unsigned int". To be consistent and to match with uapi
> > definitions, define PASID and its variations (e.g. max PASID) as "u32".
> > "u32" is also shorter and a little more explicit than "unsigned int".
> >
> > No PASID type change in uapi although it defines PASID as __u64 in
> > some places.
> 
> just out of curious, why not using ioasid_t? In Linux kernel, PASID is
> managed by ioasid.

ioasid_t is only used in limited underneath files (ioasid.c and ioasid.h).
Instead of changing hundreds of places to use ioasid_t, it's better to
keep ioasid_t only used in the files.

And it's explict and matches with uapi to define PASID as u32. Changing
to ioasid_t in so many files (amd, gpu, crypto, etc) may confuse upper
users on "why ioasid_t".

So we had better to explicitly define PASID as u32 and keep ioasid_t in
the limited underneath files.

Thanks.

-Fenghua
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [PATCH v6 01/12] iommu: Change type of pasid to u32
  2020-07-14 13:54     ` Fenghua Yu
@ 2020-07-14 13:56       ` Liu, Yi L
  0 siblings, 0 replies; 36+ messages in thread
From: Liu, Yi L @ 2020-07-14 13:56 UTC (permalink / raw)
  To: Yu, Fenghua
  Cc: Peter Zijlstra, Hansen, Dave, H Peter Anvin,
	Jean-Philippe Brucker, Jiang, Dave, Raj,  Ashok, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Shankar, Ravi V, Borislav Petkov,
	Thomas Gleixner, Luck,  Tony, Felix Kuehling, linux-kernel,
	iommu, Pan, Jacob jun, David Woodhouse

> From: Yu, Fenghua <fenghua.yu@intel.com>
> Sent: Tuesday, July 14, 2020 9:55 PM
> On Mon, Jul 13, 2020 at 07:45:49PM -0700, Liu, Yi L wrote:
> > > From: Fenghua Yu <fenghua.yu@intel.com>
> > > Sent: Tuesday, July 14, 2020 7:48 AM
> > >
> > > PASID is defined as a few different types in iommu including "int",
> > > "u32", and "unsigned int". To be consistent and to match with uapi
> > > definitions, define PASID and its variations (e.g. max PASID) as "u32".
> > > "u32" is also shorter and a little more explicit than "unsigned int".
> > >
> > > No PASID type change in uapi although it defines PASID as __u64 in
> > > some places.
> >
> > just out of curious, why not using ioasid_t? In Linux kernel, PASID is
> > managed by ioasid.
> 
> ioasid_t is only used in limited underneath files (ioasid.c and ioasid.h).
> Instead of changing hundreds of places to use ioasid_t, it's better to keep ioasid_t
> only used in the files.
> 
> And it's explict and matches with uapi to define PASID as u32. Changing to ioasid_t in
> so many files (amd, gpu, crypto, etc) may confuse upper users on "why ioasid_t".
> 
> So we had better to explicitly define PASID as u32 and keep ioasid_t in the limited
> underneath files.

fair enough, thanks,

Regards,
Yi Liu

> Thanks.
> 
> -Fenghua
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)
  2020-07-14  3:25   ` Liu, Yi L
@ 2020-07-15 23:32     ` Fenghua Yu
  0 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-15 23:32 UTC (permalink / raw)
  To: Liu, Yi L
  Cc: Peter Zijlstra, Hansen, Dave, H Peter Anvin,
	Jean-Philippe Brucker, Jiang, Dave, Raj, Ashok, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Shankar, Ravi V, Borislav Petkov,
	Thomas Gleixner, Luck, Tony, Felix Kuehling, linux-kernel, iommu,
	Pan, Jacob jun, David Woodhouse

Hi, Yi,

On Mon, Jul 13, 2020 at 08:25:20PM -0700, Liu, Yi L wrote:
> > From: Fenghua Yu <fenghua.yu@intel.com>
> > Sent: Tuesday, July 14, 2020 7:48 AM
> > From: Ashok Raj <ashok.raj@intel.com>

Thank you for your comments!

But I think we don't need to update this patch because the current text
is better than suggested changes. I would rather to keep this patch
unchanged. Please see my following explanations for details.

> > +(PRI) allow devices to function much the same way as the CPU handling
> > +application page-faults. For more information please refer to PCIe
> > +specification Chapter 10: ATS Specification.
> > +
> 
> nit: may be helpful to mention Chapter 10 of PCIe spec since 4.0. before that, ATS has its
> own specification.

This doc only refers to the latest specs. Older specs are not useful
for this spec.

> > +ENQCMD is the glue that ensures applications can directly submit
> > +commands to the hardware and also permit hardware to be aware of
> > +application context to perform I/O operations via use of PASID.
> > +
> 
> maybe a reader will ask about ENQCMDs after reading ENQCMD/S spec. :-)

This doc only covers ENQCMD. ENQCMDS is out of scope of this doc.

> > +- Allocate the PASID, and program the process page-table (cr3) in the
> > +PASID
> > +  context entries.
> 
> nit: s/PASID context entries/PASID table entries/

Kernel IOMMU does use PASID context. So we would keep the current text.

> 
> > +- Register for mmu_notifier() to track any page-table invalidations to
> > +keep
> > +  the device tlb in sync. For example, when a page-table entry is
> 
> not only device tlb. I guess iotlb is also included.

I think they are same/similar concepts. No need to duplicate here.

> > +it into the new MSR to communicate the process identity to platform hardware.
> > +ENQCMD uses the PASID stored in this MSR to tag requests from this process.
> > +When a user submits a work descriptor to a device using the ENQCMD
> > +instruction, the PASID field in the descriptor is auto-filled with the
> > +value from MSR_IA32_PASID. Requests for DMA from the device are also
> > +tagged with the same PASID. The platform IOMMU uses the PASID in the
> 
> not quite get " Requests for DMA from the device are also tagged with the
> same PASID"

The DMA access from device to the process address space uses the same PASID
set up in the MSR.
 
> 
> > +transaction to perform address translation. The IOMMU api's setup the
> 
> s/api's/apis/ ?


> 
> > +corresponding PASID entry in IOMMU with the process address used by the CPU
> > (for e.g cr3 in x86).
> 
> with the process page tables used by the CPU (e.g. the page tables pointed by cr3 in x86).

We use address to match the "address space" specified in the term of PASID.
page table is implementation details. So we would keep the current text.

> > +The MSR must be configured on each logical CPU before any application
> 
> s/MSR/MSR_IA32_PASID/

The MSR refers to MSR_IA32_PASID. We don't need to repeat long
"MSR_IA32_PASID" everywhere while a short "the MSR" is better and clear
in the doc.

> > +thread can interact with a device. Threads that belong to the same
> > +process share the same page tables, thus the same MSR value.
> 
> s/MSR/PASID/

The "MSR" is better because we focus on the MSR value here that is set up for
each thread.

> 
> > +
> > +PASID is cleared when a process is created. The PASID allocation and
> 
> s/PASID/MSR_IA32_PASID/

No. It is PASID that is cleared per process creation. We are not talking about
the MSR here although the MSR will cleared as part of process FPU init.

> 
> > +MSR programming may occur long after a process and its threads have been
> > created.
> > +One thread must call bind() to allocate the PASID for the process. If a
> 
> s/bind()/iommu_sva_bind_device()/ or say "call iommu api to bind a process with
> a device." :-)

bind() is better because the binding function may be changed (e.g. it was
changed in 5.8). People know bind() means binding. Even in the future the
binding function is changed again, we don't need to change the text here
if using bind().

> 
> > +thread uses ENQCMD without the MSR first being populated, it will cause #GP.
> > +The kernel will fix up the #GP by writing the process-wide PASID into
> > +the thread that took the #GP. A single process PASID can be used
> > +simultaneously with multiple devices since they all share the same address space.
> 
> simultaneously with multiple devices if they all share the process address
> space.

Using "since" is better. It explains "why".

> 
> > +
> > +New threads could inherit the MSR value from the parent. But this would
> 
> s/MSR/MSR_IA32_PASID/

Ditto. It's unnecessary to use long "MSR_IA32_PASID" everywhere.
"The MSR" is concise and clear.

Thanks.

-Fenghua
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 01/12] iommu: Change type of pasid to u32
  2020-07-13 23:47 ` [PATCH v6 01/12] iommu: Change type of pasid to u32 Fenghua Yu
  2020-07-14  2:45   ` Liu, Yi L
@ 2020-07-22 14:03   ` Joerg Roedel
  2020-07-22 17:21     ` Fenghua Yu
  1 sibling, 1 reply; 36+ messages in thread
From: Joerg Roedel @ 2020-07-22 14:03 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Jean-Philippe Brucker, Tony Luck, Dave Jiang, Ashok Raj,
	Ravi V Shankar, Peter Zijlstra, Felix Kuehling, x86,
	linux-kernel, amd-gfx, Christoph Hellwig, Dave Hansen, iommu,
	Ingo Molnar, Borislav Petkov, Jacob Jun Pan, H Peter Anvin,
	Thomas Gleixner, David Woodhouse

On Mon, Jul 13, 2020 at 04:47:56PM -0700, Fenghua Yu wrote:
> PASID is defined as a few different types in iommu including "int",
> "u32", and "unsigned int". To be consistent and to match with uapi
> definitions, define PASID and its variations (e.g. max PASID) as "u32".
> "u32" is also shorter and a little more explicit than "unsigned int".
> 
> No PASID type change in uapi although it defines PASID as __u64 in
> some places.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>

For the IOMMU parts:

Acked-by: Joerg Roedel <jroedel@suse.de>

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 01/12] iommu: Change type of pasid to u32
  2020-07-22 14:03   ` Joerg Roedel
@ 2020-07-22 17:21     ` Fenghua Yu
  0 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-07-22 17:21 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Bruce Schlobohm, Peter Zijlstra, Dave Hansen, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Ravi V Shankar, Borislav Petkov,
	Thomas Gleixner, Tony Luck, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, David Woodhouse

Hi, Joerg,

On Wed, Jul 22, 2020 at 04:03:40PM +0200, Joerg Roedel wrote:
> On Mon, Jul 13, 2020 at 04:47:56PM -0700, Fenghua Yu wrote:
> > PASID is defined as a few different types in iommu including "int",
> > "u32", and "unsigned int". To be consistent and to match with uapi
> > definitions, define PASID and its variations (e.g. max PASID) as "u32".
> > "u32" is also shorter and a little more explicit than "unsigned int".
> > 
> > No PASID type change in uapi although it defines PASID as __u64 in
> > some places.
> > 
> > Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
> > Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
> 
> For the IOMMU parts:
> 
> Acked-by: Joerg Roedel <jroedel@suse.de>

Thank you!

-Fenghua 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-07-13 23:48 ` [PATCH v6 12/12] x86/traps: Fix up invalid PASID Fenghua Yu
@ 2020-07-31 23:34   ` Andy Lutomirski
  2020-08-01  0:42     ` Fenghua Yu
  2020-08-03 15:03     ` Dave Hansen
  2020-08-01  1:28   ` Andy Lutomirski
  1 sibling, 2 replies; 36+ messages in thread
From: Andy Lutomirski @ 2020-07-31 23:34 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Peter Zijlstra, Dave Hansen, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Ravi V Shankar, Borislav Petkov,
	Thomas Gleixner, Tony Luck, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, David Woodhouse

On Mon, Jul 13, 2020 at 4:48 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> A #GP fault is generated when ENQCMD instruction is executed without
> a valid PASID value programmed in the current thread's PASID MSR. The
> #GP fault handler will initialize the MSR if a PASID has been allocated
> for this process.
>
> Decoding the user instruction is ugly and sets a bad architecture
> precedent. It may not function if the faulting instruction is modified
> after #GP.
>
> Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
> without a valid PASID value programmed. #GP error codes are 16 bits and all
> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
> choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
> when loading from the source operand, so its not fully comprehending all
> the reasons. Rather than special case the ENQCMD, in future Intel may
> choose a different fault mechanism for such cases if recovery is needed on
> #GP.

Decoding the user instruction is ugly and sets a bad architecture
precedent, but we already do it in #GP for UMIP.  So I'm unconvinced.

Memo to Intel, though: you REALLY need to start thinking about what
the heck an OS is supposed to do with all these new faults you're
coming up with.  The new #NM for TILE is utterly nonsensical.  Sure,
it works for an OS that does not use CR0.TS and as long as no one
tries to extend the same mechanism for some new optional piece of
state, but as soon as Intel tries to use the same mechanism for
anything else, it falls apart.

Please do better.

> +
> +/*
> + * Write the current task's PASID MSR/state. This is called only when PASID
> + * is enabled.
> + */
> +static void fpu__pasid_write(u32 pasid)
> +{
> +       u64 msr_val = pasid | MSR_IA32_PASID_VALID;
> +
> +       fpregs_lock();
> +
> +       /*
> +        * If the MSR is active and owned by the current task's FPU, it can
> +        * be directly written.
> +        *
> +        * Otherwise, write the fpstate.
> +        */
> +       if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> +               wrmsrl(MSR_IA32_PASID, msr_val);
> +       } else {
> +               struct ia32_pasid_state *ppasid_state;
> +
> +               ppasid_state = get_xsave_addr(&current->thread.fpu.state.xsave,
> +                                             XFEATURE_PASID);
> +               /*
> +                * ppasid_state shouldn't be NULL because XFEATURE_PASID
> +                * is enabled.
> +                */
> +               WARN_ON_ONCE(!ppasid_state);
> +               ppasid_state->pasid = msr_val;

WARN instead of BUG is nice, but you'll immediate oops if this fails.
How about:

if (!WARN_ON_ONCE(!ppasid_state))
  ppasid_state->pasid = msr_val;
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-07-31 23:34   ` Andy Lutomirski
@ 2020-08-01  0:42     ` Fenghua Yu
  2020-08-03 15:03     ` Dave Hansen
  1 sibling, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-08-01  0:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ravi V Shankar, Peter Zijlstra, Dave Hansen, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Fenghua Yu, Borislav Petkov,
	Thomas Gleixner, Tony Luck, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, David Woodhouse

Hi, Andy,

On Fri, Jul 31, 2020 at 04:34:11PM -0700, Andy Lutomirski wrote:
> On Mon, Jul 13, 2020 at 4:48 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > A #GP fault is generated when ENQCMD instruction is executed without
> > a valid PASID value programmed in the current thread's PASID MSR. The
> > #GP fault handler will initialize the MSR if a PASID has been allocated
> > for this process.
> >
> > Decoding the user instruction is ugly and sets a bad architecture
> > precedent. It may not function if the faulting instruction is modified
> > after #GP.
> >
> > Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
> > without a valid PASID value programmed. #GP error codes are 16 bits and all
> > 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
> > choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
> > when loading from the source operand, so its not fully comprehending all
> > the reasons. Rather than special case the ENQCMD, in future Intel may
> > choose a different fault mechanism for such cases if recovery is needed on
> > #GP.
> 
> Decoding the user instruction is ugly and sets a bad architecture
> precedent, but we already do it in #GP for UMIP.  So I'm unconvinced.

Maybe just remove the "Decoding the user instruction ... bad architecture
precedent" sentence? The sentence is vague.

As described in the following "It may not function ..." sentence, the real
issue of parsing the instruction is the instruction may be modified by
another processor before it's parsed in the #GP handler.

If just keep the "It may not function ..." sentence, is that good enough to
explain why we don't parse the faulting instruction?

> 
> Memo to Intel, though: you REALLY need to start thinking about what
> the heck an OS is supposed to do with all these new faults you're
> coming up with.  The new #NM for TILE is utterly nonsensical.  Sure,
> it works for an OS that does not use CR0.TS and as long as no one
> tries to extend the same mechanism for some new optional piece of
> state, but as soon as Intel tries to use the same mechanism for
> anything else, it falls apart.
> 
> Please do better.

Internally we did discuss the error code in #GP for PASID with HW architects.
But due to some uarch reason, it's not simple to report the error code for
PASID:( Please see previous discussion on the error code for PASID:
https://lore.kernel.org/lkml/20200427224646.GA103955@otc-nc-03/

It's painful for our SW guys to check exception reasons if hardware
doesn't explicitly tell us.

Hopefully the heuristics (fixup the PASID MSR if the process already has
a valid PASID but the MSR doesn't have one yet) in this patch is acceptable.
 
> 
> > +
> > +/*
> > + * Write the current task's PASID MSR/state. This is called only when PASID
> > + * is enabled.
> > + */
> > +static void fpu__pasid_write(u32 pasid)
> > +{
> > +       u64 msr_val = pasid | MSR_IA32_PASID_VALID;
> > +
> > +       fpregs_lock();
> > +
> > +       /*
> > +        * If the MSR is active and owned by the current task's FPU, it can
> > +        * be directly written.
> > +        *
> > +        * Otherwise, write the fpstate.
> > +        */
> > +       if (!test_thread_flag(TIF_NEED_FPU_LOAD)) {
> > +               wrmsrl(MSR_IA32_PASID, msr_val);
> > +       } else {
> > +               struct ia32_pasid_state *ppasid_state;
> > +
> > +               ppasid_state = get_xsave_addr(&current->thread.fpu.state.xsave,
> > +                                             XFEATURE_PASID);
> > +               /*
> > +                * ppasid_state shouldn't be NULL because XFEATURE_PASID
> > +                * is enabled.
> > +                */
> > +               WARN_ON_ONCE(!ppasid_state);
> > +               ppasid_state->pasid = msr_val;
> 
> WARN instead of BUG is nice, but you'll immediate oops if this fails.
> How about:
> 
> if (!WARN_ON_ONCE(!ppasid_state))
>   ppasid_state->pasid = msr_val;

OK. I will fix this issue.

Thank you very much for your review!

-Fenghua
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-07-13 23:48 ` [PATCH v6 12/12] x86/traps: Fix up invalid PASID Fenghua Yu
  2020-07-31 23:34   ` Andy Lutomirski
@ 2020-08-01  1:28   ` Andy Lutomirski
  2020-08-03 17:19     ` Fenghua Yu
  1 sibling, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2020-08-01  1:28 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Peter Zijlstra, Dave Hansen, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Ravi V Shankar, Borislav Petkov,
	Thomas Gleixner, Tony Luck, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, David Woodhouse

On Mon, Jul 13, 2020 at 4:48 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> A #GP fault is generated when ENQCMD instruction is executed without
> a valid PASID value programmed in the current thread's PASID MSR. The
> #GP fault handler will initialize the MSR if a PASID has been allocated
> for this process.

Let's take a step back here.  Why are we trying to avoid IPIs?  If you
call munmap(), you IPI other CPUs running tasks in the current mm.  If
you do perf_event_open() and thus acquire RDPMC permission, you IPI
other CPUs running tasks in the current mm.  If you call modify_ldt(),
you IPI other CPUs running tasks in the current mm.  These events can
all happen more than once per process.

Now we have ENQCMD.  An mm can be assigned a PASID *once* in the model
that these patches support.  Why not just send an IPI using
essentially identical code to the LDT sync or the CR4.PCE sync?

--Andy
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 09/12] x86/process: Clear PASID state for a newly forked/cloned thread
  2020-07-13 23:48 ` [PATCH v6 09/12] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
@ 2020-08-01  1:44   ` Andy Lutomirski
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2020-08-01  1:44 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Peter Zijlstra, Dave Hansen, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Ravi V Shankar, Borislav Petkov,
	Thomas Gleixner, Tony Luck, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, David Woodhouse

On Mon, Jul 13, 2020 at 4:48 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> The PASID state has to be cleared on forks, since the child has a
> different address space. The PASID is also cleared for thread clone. While
> it would be correct to inherit the PASID in this case, it is unknown
> whether the new task will use ENQCMD. Giving it the PASID "just in case"
> would have the downside of increased context switch overhead to setting
> the PASID MSR.
>
> Since #GP faults have to be handled on any threads that were created before
> the PASID was assigned to the mm of the process, newly created threads
> might as well be treated in a consistent way.

Just how much context switch overhead are we talking about here?
Unless the CPU works differently than I expect, I would guess you are
saving exactly zero cycles.  What am I missing?

--Andy


>
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
> v2:
> - Modify init_task_pasid().
>
>  arch/x86/kernel/process.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index f362ce0d5ac0..1b1492e337a6 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -121,6 +121,21 @@ static int set_new_tls(struct task_struct *p, unsigned long tls)
>                 return do_set_thread_area_64(p, ARCH_SET_FS, tls);
>  }
>
> +/* Initialize the PASID state for the forked/cloned thread. */
> +static void init_task_pasid(struct task_struct *task)
> +{
> +       struct ia32_pasid_state *ppasid;
> +
> +       /*
> +        * Initialize the PASID state so that the PASID MSR will be
> +        * initialized to its initial state (0) by XRSTORS when the task is
> +        * scheduled for the first time.
> +        */
> +       ppasid = get_xsave_addr(&task->thread.fpu.state.xsave, XFEATURE_PASID);
> +       if (ppasid)
> +               ppasid->pasid = INIT_PASID;
> +}
> +
>  int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
>                     unsigned long arg, struct task_struct *p, unsigned long tls)
>  {
> @@ -174,6 +189,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
>         task_user_gs(p) = get_user_gs(current_pt_regs());
>  #endif
>
> +       if (static_cpu_has(X86_FEATURE_ENQCMD))
> +               init_task_pasid(p);
> +
>         /* Set a new TLS for the child thread? */
>         if (clone_flags & CLONE_SETTLS)
>                 ret = set_new_tls(p, tls);
> --
> 2.19.1
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-07-31 23:34   ` Andy Lutomirski
  2020-08-01  0:42     ` Fenghua Yu
@ 2020-08-03 15:03     ` Dave Hansen
  2020-08-03 15:12       ` Andy Lutomirski
  1 sibling, 1 reply; 36+ messages in thread
From: Dave Hansen @ 2020-08-03 15:03 UTC (permalink / raw)
  To: Andy Lutomirski, Fenghua Yu
  Cc: Jean-Philippe Brucker, Tony Luck, Dave Jiang, Ashok Raj,
	Ravi V Shankar, Peter Zijlstra, x86, linux-kernel, amd-gfx,
	Christoph Hellwig, iommu, Ingo Molnar, Borislav Petkov,
	Jacob Jun Pan, H Peter Anvin, Thomas Gleixner, David Woodhouse,
	Felix Kuehling

On 7/31/20 4:34 PM, Andy Lutomirski wrote:
>> Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
>> without a valid PASID value programmed. #GP error codes are 16 bits and all
>> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
>> choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
>> when loading from the source operand, so its not fully comprehending all
>> the reasons. Rather than special case the ENQCMD, in future Intel may
>> choose a different fault mechanism for such cases if recovery is needed on
>> #GP.
> Decoding the user instruction is ugly and sets a bad architecture
> precedent, but we already do it in #GP for UMIP.  So I'm unconvinced.

I'll try to do one more bit of convincing. :)

In the end, we need a way to figure out if the #GP was from a known "OK"
source that we can fix up.  You're right that we could fire up the
instruction decoder to help answer that question.  But, it (also)
doesn't easily yield a perfect answer as to the source of the #GP, it
always involves a user copy, and it's a larger code impact than what
we've got.

I think I went and looked at fixup_umip_exception(), and compared it to
the alternative which is essentially just these three lines of code:

> +	/*
> +	 * If the current task already has a valid PASID in the MSR,
> +	 * the #GP must be for some other reason.
> +	 */
> +	if (current->has_valid_pasid)
> +		return false;
...> +	/* Now the current task has a valid PASID in the MSR. */
> +	current->has_valid_pasid = 1;

and *I* was convinced that instruction decoding wasn't worth it.

There's a lot of stuff that fixup_umip_exception() does which we don't
have to duplicate, but it's going to be really hard to get it anywhere
near as compact as what we've got.

I guess Fenghua could go give instruction decoding a shot and see how it
looks, though.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-08-03 15:03     ` Dave Hansen
@ 2020-08-03 15:12       ` Andy Lutomirski
  2020-08-03 15:19         ` Raj, Ashok
  2020-08-03 16:36         ` Dave Hansen
  0 siblings, 2 replies; 36+ messages in thread
From: Andy Lutomirski @ 2020-08-03 15:12 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ravi V Shankar, Peter Zijlstra, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Fenghua Yu, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, Tony Luck, Felix Kuehling,
	linux-kernel, iommu, Jacob Jun Pan, David Woodhouse

On Mon, Aug 3, 2020 at 8:03 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 7/31/20 4:34 PM, Andy Lutomirski wrote:
> >> Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
> >> without a valid PASID value programmed. #GP error codes are 16 bits and all
> >> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
> >> choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
> >> when loading from the source operand, so its not fully comprehending all
> >> the reasons. Rather than special case the ENQCMD, in future Intel may
> >> choose a different fault mechanism for such cases if recovery is needed on
> >> #GP.
> > Decoding the user instruction is ugly and sets a bad architecture
> > precedent, but we already do it in #GP for UMIP.  So I'm unconvinced.
>
> I'll try to do one more bit of convincing. :)
>
> In the end, we need a way to figure out if the #GP was from a known "OK"
> source that we can fix up.  You're right that we could fire up the
> instruction decoder to help answer that question.  But, it (also)
> doesn't easily yield a perfect answer as to the source of the #GP, it
> always involves a user copy, and it's a larger code impact than what
> we've got.
>
> I think I went and looked at fixup_umip_exception(), and compared it to
> the alternative which is essentially just these three lines of code:
>
> > +     /*
> > +      * If the current task already has a valid PASID in the MSR,
> > +      * the #GP must be for some other reason.
> > +      */
> > +     if (current->has_valid_pasid)
> > +             return false;
> ...> +  /* Now the current task has a valid PASID in the MSR. */
> > +     current->has_valid_pasid = 1;
>
> and *I* was convinced that instruction decoding wasn't worth it.
>
> There's a lot of stuff that fixup_umip_exception() does which we don't
> have to duplicate, but it's going to be really hard to get it anywhere
> near as compact as what we've got.
>

I could easily be convinced that the PASID fixup is so trivial and so
obviously free of misfiring in a way that causes an infinite loop that
this code is fine.  But I think we first need to answer the bigger
question of why we're doing a lazy fixup in the first place.

--Andy
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-08-03 15:12       ` Andy Lutomirski
@ 2020-08-03 15:19         ` Raj, Ashok
  2020-08-03 16:36         ` Dave Hansen
  1 sibling, 0 replies; 36+ messages in thread
From: Raj, Ashok @ 2020-08-03 15:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ravi V Shankar, Peter Zijlstra, Dave Hansen, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Fenghua Yu, Borislav Petkov,
	Thomas Gleixner, Tony Luck, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, David Woodhouse

On Mon, Aug 03, 2020 at 08:12:18AM -0700, Andy Lutomirski wrote:
> On Mon, Aug 3, 2020 at 8:03 AM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 7/31/20 4:34 PM, Andy Lutomirski wrote:
> > >> Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
> > >> without a valid PASID value programmed. #GP error codes are 16 bits and all
> > >> 16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
> > >> choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
> > >> when loading from the source operand, so its not fully comprehending all
> > >> the reasons. Rather than special case the ENQCMD, in future Intel may
> > >> choose a different fault mechanism for such cases if recovery is needed on
> > >> #GP.
> > > Decoding the user instruction is ugly and sets a bad architecture
> > > precedent, but we already do it in #GP for UMIP.  So I'm unconvinced.
> >
> > I'll try to do one more bit of convincing. :)
> >
> > In the end, we need a way to figure out if the #GP was from a known "OK"
> > source that we can fix up.  You're right that we could fire up the
> > instruction decoder to help answer that question.  But, it (also)
> > doesn't easily yield a perfect answer as to the source of the #GP, it
> > always involves a user copy, and it's a larger code impact than what
> > we've got.
> >
> > I think I went and looked at fixup_umip_exception(), and compared it to
> > the alternative which is essentially just these three lines of code:
> >
> > > +     /*
> > > +      * If the current task already has a valid PASID in the MSR,
> > > +      * the #GP must be for some other reason.
> > > +      */
> > > +     if (current->has_valid_pasid)
> > > +             return false;
> > ...> +  /* Now the current task has a valid PASID in the MSR. */
> > > +     current->has_valid_pasid = 1;
> >
> > and *I* was convinced that instruction decoding wasn't worth it.
> >
> > There's a lot of stuff that fixup_umip_exception() does which we don't
> > have to duplicate, but it's going to be really hard to get it anywhere
> > near as compact as what we've got.
> >
> 
> I could easily be convinced that the PASID fixup is so trivial and so
> obviously free of misfiring in a way that causes an infinite loop that
> this code is fine.  But I think we first need to answer the bigger
> question of why we're doing a lazy fixup in the first place.

We choose lazy fixup for 2 reasons. 

- If some threads were already created before the MSR is programmed, then
  we need to fixup those in a race free way. scheduling some task-work etc.
  We did do that early on, but decided it was ugly. 
- Not all threads need to submit ENQCMD, force feeding the MSR probably
  isn't even required for all. Yes the overhead isn't probably big, but
  might not even be required for all threads.

We needed to fixup MSR in two different way. To keep the code simple, the
choice was to only fixup on #GP, that eliminated the extra code we need to
support case1.

Cheers,
Ashok
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-08-03 15:12       ` Andy Lutomirski
  2020-08-03 15:19         ` Raj, Ashok
@ 2020-08-03 16:36         ` Dave Hansen
  2020-08-03 17:16           ` Andy Lutomirski
  1 sibling, 1 reply; 36+ messages in thread
From: Dave Hansen @ 2020-08-03 16:36 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ravi V Shankar, Peter Zijlstra, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Fenghua Yu, Borislav Petkov,
	Thomas Gleixner, Tony Luck, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, David Woodhouse

On 8/3/20 8:12 AM, Andy Lutomirski wrote:
> I could easily be convinced that the PASID fixup is so trivial and so
> obviously free of misfiring in a way that causes an infinite loop that
> this code is fine.  But I think we first need to answer the bigger
> question of why we're doing a lazy fixup in the first place.

There was an (internal to Intel) implementation of this about a year ago
that used smp_call_function_many() to force the MSR state into all
threads of a process.  I took one look at it, decided there was a 0%
chance of it actually functioning and recommended we find another way.
While I'm sure it could be done more efficiently, the implementation I
looked at took ~200 lines of code and comments.  It started to look too
much like another instance of mm->cpumask for my comfort.

The only other option I could think of would be an ABI where threads
were required to call into the kernel at least once after creation
before calling ENQCMD.  All ENQCMDs would be required to be "wrapped" by
code doing this syscall.  Something like this:

	if (!thread_local(did_pasid_init))
		sys_pasid_init(); // new syscall or prctl

	thread_local(did_pasid_init) = 1;

	// ENQCMD here
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-08-03 16:36         ` Dave Hansen
@ 2020-08-03 17:16           ` Andy Lutomirski
  2020-08-03 17:34             ` Dave Hansen
  0 siblings, 1 reply; 36+ messages in thread
From: Andy Lutomirski @ 2020-08-03 17:16 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ravi V Shankar, Peter Zijlstra, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Fenghua Yu, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, Tony Luck, Felix Kuehling,
	linux-kernel, iommu, Jacob Jun Pan, David Woodhouse

On Mon, Aug 3, 2020 at 9:37 AM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 8/3/20 8:12 AM, Andy Lutomirski wrote:
> > I could easily be convinced that the PASID fixup is so trivial and so
> > obviously free of misfiring in a way that causes an infinite loop that
> > this code is fine.  But I think we first need to answer the bigger
> > question of why we're doing a lazy fixup in the first place.
>
> There was an (internal to Intel) implementation of this about a year ago
> that used smp_call_function_many() to force the MSR state into all
> threads of a process.  I took one look at it, decided there was a 0%
> chance of it actually functioning and recommended we find another way.
> While I'm sure it could be done more efficiently, the implementation I
> looked at took ~200 lines of code and comments.  It started to look too
> much like another instance of mm->cpumask for my comfort.

If I were implementing this, I would try making switch_mm_irqs_off()
do, roughly:

void load_mm_pasid(...) {
  if (cpu_feature_enabled(X86_FEATURE_ENQCMD))
    tsk->xstate[offset] = READ_ONCE(next->context.pasid);
}

This costs one cache miss, although the cache line in question is
about to be read anyway.  It might be faster to, instead, do:

void load_mm_pasid(...) {
  u32 pasid = READ_ONCE(next->context.pasid);

  if (tsk->xstate[offset] != pasid)
    tsk->state[offset] = pasid;
}

so we don't dirty the cache line in the common case.  The actual
generated code ought to be pretty good -- surely the offset of PASID
in XSTATE is an entry in an array somewhere that can be found with a
single read, right?

The READ_ONCE is because this could race against a write to
context.pasid, so this code needs to be at the end of the function
where it's protected by mm_cpumask.  With all this done, the pasid
update is just on_each_cpu_mask(mm_cpumask(mm), load_mm_pasid, mm,
true).

This looks like maybe 20 lines of code.  As an added bonus, it lets us
free PASIDs early if we ever decide we want to.




May I take this opportunity to ask Intel to please put some real
thought into future pieces of CPU state?  Here's a summary of some
things we have:

- Normal extended state (FPU, XMM, etc): genuinely per thread and only
ever used explicitly.  Actually makes sense with XSAVE(C/S).

- PKRU: affects CPL0-originated memory accesses, so it needs to be
eagerly loaded in the kernel.  Does not make sense with XRSTOR(C/S),
but it's in there anyway.

- CR3: per-mm state.  Makes some sense in CR3, but it's tangled up
with CR4 in nasty ways.

- LDTR: per-mm on Linux and mostly obsolete everyone.  In it's own
register, so it's not a big deal.

- PASID: per-mm state (surely Intel always intended it to be per-mm,
since it's for shared _virtual memory_!).  But for some reason it's in
an MSR (which is slow), and it's cleverly, but not that cleverly,
accessible with XSAVES/XRSTORS.  Doesn't actually make sense.  Also,
PASID is lazy-loadable, but the mechanism for telling the kernel that
a lazy load is needed got flubbed.

- TILE: genuinely per-thread, but it's expensive so it's
lazy-loadable.  But the lazy-load mechanism reuses #NM, and it's not
fully disambiguated from the other use of #NM.  So it sort of works,
but it's gross.

- "KERNEL_GS_BASE", i.e. the shadow GS base.  This is logically
per-user-thread state, but it's only accessible in MSRs.  For some
reason this is *not* in XSAVES/XRSTORS state, nor is there any
efficient way to access it at all.

- Segment registers: can't be properly saved except by hypervisors,
and can almost, but not quite, be properly loaded (assuming the state
was sane to begin with) by normal kernels.  Just don't try to load 1,
2, or 3 into any of them.

Sometimes I think that this is all intended to be as confusing as
possible and that it's all a ploy to keep context switches slow and
complicated.  Maybe Intel doesn't actually want to compete with other
architectures that can context switch quickly?


It would be really nice if we had a clean way to load per-mm state
(see my private emails about this), a clean way to load CPL3 register
state, and a clean way to load per-user-thread *kernel* register state
(e.g. PKRU and probably PKRS).  And there should be an exception that
says "user code accessed a lazy-loaded resource that isn't loaded, and
this is the resource it tried to access".
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-08-01  1:28   ` Andy Lutomirski
@ 2020-08-03 17:19     ` Fenghua Yu
  0 siblings, 0 replies; 36+ messages in thread
From: Fenghua Yu @ 2020-08-03 17:19 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ravi V Shankar, Peter Zijlstra, Dave Hansen, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Fenghua Yu, Borislav Petkov,
	Thomas Gleixner, Tony Luck, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, David Woodhouse

Hi, Andy,

On Fri, Jul 31, 2020 at 06:28:37PM -0700, Andy Lutomirski wrote:
> On Mon, Jul 13, 2020 at 4:48 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
> >
> > A #GP fault is generated when ENQCMD instruction is executed without
> > a valid PASID value programmed in the current thread's PASID MSR. The
> > #GP fault handler will initialize the MSR if a PASID has been allocated
> > for this process.
> 
> Let's take a step back here.  Why are we trying to avoid IPIs?  If you
> call munmap(), you IPI other CPUs running tasks in the current mm.  If
> you do perf_event_open() and thus acquire RDPMC permission, you IPI
> other CPUs running tasks in the current mm.  If you call modify_ldt(),
> you IPI other CPUs running tasks in the current mm.  These events can
> all happen more than once per process.
> 
> Now we have ENQCMD.  An mm can be assigned a PASID *once* in the model
> that these patches support.  Why not just send an IPI using
> essentially identical code to the LDT sync or the CR4.PCE sync?

ldt (or the other two cases) is different from ENQCMD: the PASID MSR
is per-task and is supported by xsaves.

The per-task PASID MSR needs to updated to ALL tasks. That means IPI,
which only updates running tasks' MSRs, is not enough. All tasks' MSRs
need to be updated when a PASID is allocated.

This difference increases the complexity of sending IPI to running tasks
and updating sleeping tasks's MSRs with locking etc.

Of course, it's doable not to update the MSRs in all task when a new PASID
is allocated to the mm. But that means we need to discard xsaves support
for the MSR and create our own switch function to load the MSR. That
increases complexity.

We tried similar IPI way to update the PASID in about 200 lines of code.
As Dave Hansen pointed, it's too complex. The current lazy updating the MSR
only takes essential 3 lines of code in #GP.

Does it make sense to still use the current fix up method to update the MSR?

Thanks.

-Fenghua
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-08-03 17:16           ` Andy Lutomirski
@ 2020-08-03 17:34             ` Dave Hansen
  2020-08-03 19:24               ` Andy Lutomirski
  0 siblings, 1 reply; 36+ messages in thread
From: Dave Hansen @ 2020-08-03 17:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ravi V Shankar, Peter Zijlstra, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Fenghua Yu, Borislav Petkov,
	Thomas Gleixner, Tony Luck, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, David Woodhouse

On 8/3/20 10:16 AM, Andy Lutomirski wrote:
> - TILE: genuinely per-thread, but it's expensive so it's
> lazy-loadable.  But the lazy-load mechanism reuses #NM, and it's not
> fully disambiguated from the other use of #NM.  So it sort of works,
> but it's gross.

For those playing along at home, there's a new whitepaper out from Intel
about some new CPU features which are going to be fun:

> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

Which part were you worried about?  I thought it was fully disambuguated
from this:

> When XFD causes an instruction to generate #NM, the processor loads
> the IA32_XFD_ERR MSR to identify the disabled state component(s).
> Specifically, the MSR is loaded with the logical AND of the IA32_XFD
> MSR and the bitmap corresponding to the state components required by
> the faulting instruction.
> 
> Device-not-available exceptions that are not due to XFD — those 
> resulting from setting CR0.TS to 1 — do not modify the IA32_XFD_ERR
> MSR.

So if you always make sure to *clear* IA32_XFD_ERR after handing and XFD
exception, any #NM's with a clear IA32_XFD_ERR are from "legacy"
CR0.TS=1.  Any bits set in IA32_XFD_ERR mean a new-style XFD exception.

Am I missing something?
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 12/12] x86/traps: Fix up invalid PASID
  2020-08-03 17:34             ` Dave Hansen
@ 2020-08-03 19:24               ` Andy Lutomirski
  0 siblings, 0 replies; 36+ messages in thread
From: Andy Lutomirski @ 2020-08-03 19:24 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Ravi V Shankar, Peter Zijlstra, H Peter Anvin,
	Jean-Philippe Brucker, Dave Jiang, Ashok Raj, x86, amd-gfx,
	Christoph Hellwig, Ingo Molnar, Fenghua Yu, Borislav Petkov,
	Andy Lutomirski, Thomas Gleixner, Tony Luck, Felix Kuehling,
	linux-kernel, iommu, Jacob Jun Pan, David Woodhouse




> On Aug 3, 2020, at 10:34 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 8/3/20 10:16 AM, Andy Lutomirski wrote:
>> - TILE: genuinely per-thread, but it's expensive so it's
>> lazy-loadable.  But the lazy-load mechanism reuses #NM, and it's not
>> fully disambiguated from the other use of #NM.  So it sort of works,
>> but it's gross.
> 
> For those playing along at home, there's a new whitepaper out from Intel
> about some new CPU features which are going to be fun:
> 
>> https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf
> 
> Which part were you worried about?  I thought it was fully disambuguated
> from this:
> 
>> When XFD causes an instruction to generate #NM, the processor loads
>> the IA32_XFD_ERR MSR to identify the disabled state component(s).
>> Specifically, the MSR is loaded with the logical AND of the IA32_XFD
>> MSR and the bitmap corresponding to the state components required by
>> the faulting instruction.
>> 
>> Device-not-available exceptions that are not due to XFD — those 
>> resulting from setting CR0.TS to 1 — do not modify the IA32_XFD_ERR
>> MSR.
> 
> So if you always make sure to *clear* IA32_XFD_ERR after handing and XFD
> exception, any #NM's with a clear IA32_XFD_ERR are from "legacy"
> CR0.TS=1.  Any bits set in IA32_XFD_ERR mean a new-style XFD exception.
> 
> Am I missing something?

I don’t think you’re missing anything, but this mechanism seems to be solidly in the category of “just barely works”, kind of like #DB and DR6, which also just barely works.

And this PASID vs #GP mess is just sad. We’re trying to engineer around an issue that has no need to exist in the first place.  For some reason we have two lazy-loading-fault mechanisms showing up in x86 in rapid succession, one of them is maybe 3/4-baked, and the other is totally separate and maybe 1/4 baked.

If ENQCMD instead raise #NM, this would be trivial. (And it even makes sense — the error is, literally, “an instruction tried to use something in XSTATE that isn’t initialized”.). Or if someone had noticed that, kind of like PKRU, PASID didn’t really belong in XSTATE, we wouldn’t have this mess.

Yes, obviously Linux can get all this stuff to work, but maybe Intel could aspire to design features that are straightforward to use well instead of merely possible to use well?
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 08/12] fork: Clear PASID for new mm
  2020-07-13 23:48 ` [PATCH v6 08/12] fork: Clear PASID for new mm Fenghua Yu
@ 2021-02-24 10:19   ` Jean-Philippe Brucker
  2021-02-25 22:17     ` Fenghua Yu
  0 siblings, 1 reply; 36+ messages in thread
From: Jean-Philippe Brucker @ 2021-02-24 10:19 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Ashok Raj, zhangfei.gao, linux-kernel, linux-mm, iommu, Jacob Jun Pan

Hi Fenghua,

[Trimmed the Cc list]

On Mon, Jul 13, 2020 at 04:48:03PM -0700, Fenghua Yu wrote:
> When a new mm is created, its PASID should be cleared, i.e. the PASID is
> initialized to its init state 0 on both ARM and X86.

I just noticed this patch was dropped in v7, and am wondering whether we
could still upstream it. Does x86 need a child with a new address space
(!CLONE_VM) to inherit the PASID of the parent?  That doesn't make much
sense with regard to IOMMU structures - same PASID indexing multiple PGDs?

Currently iommu_sva_alloc_pasid() assumes mm->pasid is always initialized
to 0 and fails on forked tasks. I'm trying to figure out how to fix this.
Could we clear the pasid on fork or does it break the x86 model?

Thanks,
Jean

> 
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> ---
> v2:
> - Add this patch to initialize PASID value for a new mm.
> 
>  include/linux/mm_types.h | 2 ++
>  kernel/fork.c            | 8 ++++++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index d61285cfe027..d60d2ec10881 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -22,6 +22,8 @@
>  #endif
>  #define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))
>  
> +/* Initial PASID value is 0. */
> +#define INIT_PASID	0
>  
>  struct address_space;
>  struct mem_cgroup;
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 142b23645d82..43b5f112604d 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1007,6 +1007,13 @@ static void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
>  #endif
>  }
>  
> +static void mm_init_pasid(struct mm_struct *mm)
> +{
> +#ifdef CONFIG_IOMMU_SUPPORT
> +	mm->pasid = INIT_PASID;
> +#endif
> +}
> +
>  static void mm_init_uprobes_state(struct mm_struct *mm)
>  {
>  #ifdef CONFIG_UPROBES
> @@ -1035,6 +1042,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
>  	mm_init_cpumask(mm);
>  	mm_init_aio(mm);
>  	mm_init_owner(mm, p);
> +	mm_init_pasid(mm);
>  	RCU_INIT_POINTER(mm->exe_file, NULL);
>  	mmu_notifier_subscriptions_init(mm);
>  	init_tlb_flush_pending(mm);
> -- 
> 2.19.1
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 08/12] fork: Clear PASID for new mm
  2021-02-24 10:19   ` Jean-Philippe Brucker
@ 2021-02-25 22:17     ` Fenghua Yu
  2021-03-01 23:00       ` Jacob Pan
  0 siblings, 1 reply; 36+ messages in thread
From: Fenghua Yu @ 2021-02-25 22:17 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Ashok Raj, zhangfei.gao, linux-kernel, linux-mm, iommu, Jacob Jun Pan

Hi, Jean,

On Wed, Feb 24, 2021 at 11:19:27AM +0100, Jean-Philippe Brucker wrote:
> Hi Fenghua,
> 
> [Trimmed the Cc list]
> 
> On Mon, Jul 13, 2020 at 04:48:03PM -0700, Fenghua Yu wrote:
> > When a new mm is created, its PASID should be cleared, i.e. the PASID is
> > initialized to its init state 0 on both ARM and X86.
> 
> I just noticed this patch was dropped in v7, and am wondering whether we
> could still upstream it. Does x86 need a child with a new address space
> (!CLONE_VM) to inherit the PASID of the parent?  That doesn't make much
> sense with regard to IOMMU structures - same PASID indexing multiple PGDs?

You are right: x86 should clear mm->pasid when a new mm is created.
This patch somehow is losted:(

> 
> Currently iommu_sva_alloc_pasid() assumes mm->pasid is always initialized
> to 0 and fails on forked tasks. I'm trying to figure out how to fix this.
> Could we clear the pasid on fork or does it break the x86 model?

x86 calls ioasid_alloc() instead of iommu_sva_alloc_pasid(). So
functionality is not a problem without this patch on x86. But I think
we do need to have this patch in the kernel because PASID is per addr
space and two addr spaces shouldn't have the same PASID.

Who will accept this patch?

Thanks.

-Fenghua
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 08/12] fork: Clear PASID for new mm
  2021-02-25 22:17     ` Fenghua Yu
@ 2021-03-01 23:00       ` Jacob Pan
  2021-03-02 10:43         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 36+ messages in thread
From: Jacob Pan @ 2021-03-01 23:00 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Jean-Philippe Brucker, Ashok Raj, zhangfei.gao, linux-kernel,
	linux-mm, iommu, jacob.jun.pan

Hi Fenghua,

On Thu, 25 Feb 2021 22:17:11 +0000, Fenghua Yu <fenghua.yu@intel.com> wrote:

> Hi, Jean,
> 
> On Wed, Feb 24, 2021 at 11:19:27AM +0100, Jean-Philippe Brucker wrote:
> > Hi Fenghua,
> > 
> > [Trimmed the Cc list]
> > 
> > On Mon, Jul 13, 2020 at 04:48:03PM -0700, Fenghua Yu wrote:  
> > > When a new mm is created, its PASID should be cleared, i.e. the PASID
> > > is initialized to its init state 0 on both ARM and X86.  
> > 
> > I just noticed this patch was dropped in v7, and am wondering whether we
> > could still upstream it. Does x86 need a child with a new address space
> > (!CLONE_VM) to inherit the PASID of the parent?  That doesn't make much
> > sense with regard to IOMMU structures - same PASID indexing multiple
> > PGDs?  
> 
> You are right: x86 should clear mm->pasid when a new mm is created.
> This patch somehow is losted:(
> 
> > 
> > Currently iommu_sva_alloc_pasid() assumes mm->pasid is always
> > initialized to 0 and fails on forked tasks. I'm trying to figure out
> > how to fix this. Could we clear the pasid on fork or does it break the
> > x86 model?  
> 
> x86 calls ioasid_alloc() instead of iommu_sva_alloc_pasid(). So
We should consolidate at some point, there is no need to store pasid in two
places.

> functionality is not a problem without this patch on x86. But I think
I feel the reason that x86 doesn't care is that mm->pasid is not used
unless bind_mm is called. For the fork children even mm->pasid is non-zero,
it has no effect since it is not loaded onto MSRs.
Perhaps you could also add a check or WARN_ON(!mm->pasid) in load_pasid()?

> we do need to have this patch in the kernel because PASID is per addr
> space and two addr spaces shouldn't have the same PASID.
> 
Agreed.

> Who will accept this patch?
> 
> Thanks.
> 
> -Fenghua


Thanks,

Jacob
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v6 08/12] fork: Clear PASID for new mm
  2021-03-01 23:00       ` Jacob Pan
@ 2021-03-02 10:43         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 36+ messages in thread
From: Jean-Philippe Brucker @ 2021-03-02 10:43 UTC (permalink / raw)
  To: Jacob Pan
  Cc: Fenghua Yu, Ashok Raj, zhangfei.gao, linux-kernel, linux-mm, iommu

On Mon, Mar 01, 2021 at 03:00:11PM -0800, Jacob Pan wrote:
> > functionality is not a problem without this patch on x86. But I think
> I feel the reason that x86 doesn't care is that mm->pasid is not used
> unless bind_mm is called.

I think vt-d also maintains the global_svm_list, that tells whether a
PASID is already allocated for the mm. The SMMU driver relies only on
iommu_sva_alloc_pasid() for this.

> For the fork children even mm->pasid is non-zero,
> it has no effect since it is not loaded onto MSRs.
> Perhaps you could also add a check or WARN_ON(!mm->pasid) in load_pasid()?
> 
> > we do need to have this patch in the kernel because PASID is per addr
> > space and two addr spaces shouldn't have the same PASID.
> > 
> Agreed.
> 
> > Who will accept this patch?

It's not clear from get_maintainers.pl, but I guess it should go via
linux-mm. Since the list wasn't cc'd on the original patch, I resent it.

Thanks,
Jean
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-03-02 10:44 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 23:47 [PATCH v6 00/12] x86: tag application address space for devices Fenghua Yu
2020-07-13 23:47 ` [PATCH v6 01/12] iommu: Change type of pasid to u32 Fenghua Yu
2020-07-14  2:45   ` Liu, Yi L
2020-07-14 13:54     ` Fenghua Yu
2020-07-14 13:56       ` Liu, Yi L
2020-07-22 14:03   ` Joerg Roedel
2020-07-22 17:21     ` Fenghua Yu
2020-07-13 23:47 ` [PATCH v6 02/12] iommu/vt-d: Change flags type to unsigned int in binding mm Fenghua Yu
2020-07-13 23:47 ` [PATCH v6 03/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing) Fenghua Yu
2020-07-14  3:25   ` Liu, Yi L
2020-07-15 23:32     ` Fenghua Yu
2020-07-13 23:47 ` [PATCH v6 04/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 05/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 06/12] x86/msr-index: Define IA32_PASID MSR Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 07/12] mm: Define pasid in mm Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 08/12] fork: Clear PASID for new mm Fenghua Yu
2021-02-24 10:19   ` Jean-Philippe Brucker
2021-02-25 22:17     ` Fenghua Yu
2021-03-01 23:00       ` Jacob Pan
2021-03-02 10:43         ` Jean-Philippe Brucker
2020-07-13 23:48 ` [PATCH v6 09/12] x86/process: Clear PASID state for a newly forked/cloned thread Fenghua Yu
2020-08-01  1:44   ` Andy Lutomirski
2020-07-13 23:48 ` [PATCH v6 10/12] x86/mmu: Allocate/free PASID Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 11/12] sched: Define and initialize a flag to identify valid PASID in the task Fenghua Yu
2020-07-13 23:48 ` [PATCH v6 12/12] x86/traps: Fix up invalid PASID Fenghua Yu
2020-07-31 23:34   ` Andy Lutomirski
2020-08-01  0:42     ` Fenghua Yu
2020-08-03 15:03     ` Dave Hansen
2020-08-03 15:12       ` Andy Lutomirski
2020-08-03 15:19         ` Raj, Ashok
2020-08-03 16:36         ` Dave Hansen
2020-08-03 17:16           ` Andy Lutomirski
2020-08-03 17:34             ` Dave Hansen
2020-08-03 19:24               ` Andy Lutomirski
2020-08-01  1:28   ` Andy Lutomirski
2020-08-03 17:19     ` Fenghua Yu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).