dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/22] XeKmd basic SVM support
@ 2023-12-21  4:37 Oak Zeng
  2023-12-21  4:37 ` [PATCH 01/22] drm/xe/svm: Add SVM document Oak Zeng
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

This is the very basic SVM (shared virtual memory) support in XeKmd
driver. SVM allows the programmer to use a shaed virtual address space
between CPU program and GPU program. It abstracts away from the user
the location of the backing memory in a mixed CPU and GPU programming
environment.

This work is based on previous I915 SVM implementation mainly from
Niranjana Vishwanathapura and Oak Zeng, which has never been upstreamed
before. This is our first attempt to upstream this work.

This implementation depends on Linux kernel HMM support. See some key
designs in patch #1.

We are aware there are currently some effort to implement SVM using
GMEM(generalized memory management,
see https://lore.kernel.org/dri-devel/20231128125025.4449-1-weixi.zhu@huawei.com/)
We are open to this new method if it can be merged to upstream kernel.
Before that, we think it is still safer to support SVM through HMM.

This series only has basic SVM support. We think it is better to post
this series earlier so we can get more eyes on it. Below are the works
that is planned or ongoing:

*Testing: We are working on the igt test right now. Some part of this
series, especially the gpu page table update(patch #7, #8) and migration
function (patch #10) need some debug to make it work.

*Virtual address range based memory attributes and hints: We plan to
expose uAPI for user to set memory attributes such as preferred location
or migration granularity etc to a virtual address range. This is
important to tune SVM performance.

*GPU vram eviction: One key design choice of this series is, SVM
layer allocate GPU memory directly from drm buddy allocator, instead
of from xe vram manager. There is no BO (buffer object) concept
in this implementation. The key benefit of this approach is we can
migrate memory at page granularity easily. This also means SVM bypasses
TTM's memory eviction logic. But we want the SVM memory and BO driver
memory can mutually evicted each other. We have some prove of concept
work to rework TTM resource manager for this purpose, see
https://lore.kernel.org/dri-devel/20231102043306.2931989-1-oak.zeng@intel.com/
We will continue work on that series then implement SVM's eviction
function based on the concept of shared drm LRU list b/t SVM and TTM/BO
driver.

Oak Zeng (22):
  drm/xe/svm: Add SVM document
  drm/xe/svm: Add svm key data structures
  drm/xe/svm: create xe svm during vm creation
  drm/xe/svm: Trace svm creation
  drm/xe/svm: add helper to retrieve svm range from address
  drm/xe/svm: Introduce a helper to build sg table from hmm range
  drm/xe/svm: Add helper for binding hmm range to gpu
  drm/xe/svm: Add helper to invalidate svm range from GPU
  drm/xe/svm: Remap and provide memmap backing for GPU vram
  drm/xe/svm: Introduce svm migration function
  drm/xe/svm: implement functions to allocate and free device memory
  drm/xe/svm: Trace buddy block allocation and free
  drm/xe/svm: Handle CPU page fault
  drm/xe/svm: trace svm range migration
  drm/xe/svm: Implement functions to register and unregister mmu
    notifier
  drm/xe/svm: Implement the mmu notifier range invalidate callback
  drm/xe/svm: clean up svm range during process exit
  drm/xe/svm: Move a few structures to xe_gt.h
  drm/xe/svm: migrate svm range to vram
  drm/xe/svm: Populate svm range
  drm/xe/svm: GPU page fault support
  drm/xe/svm: Add DRM_XE_SVM kernel config entry

 Documentation/gpu/xe/index.rst       |   1 +
 Documentation/gpu/xe/xe_svm.rst      |   8 +
 drivers/gpu/drm/xe/Kconfig           |  22 ++
 drivers/gpu/drm/xe/Makefile          |   5 +
 drivers/gpu/drm/xe/xe_device_types.h |  20 ++
 drivers/gpu/drm/xe/xe_gt.h           |  20 ++
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  28 +--
 drivers/gpu/drm/xe/xe_migrate.c      | 213 +++++++++++++++++
 drivers/gpu/drm/xe/xe_migrate.h      |   7 +
 drivers/gpu/drm/xe/xe_mmio.c         |  12 +
 drivers/gpu/drm/xe/xe_pt.c           | 134 ++++++++++-
 drivers/gpu/drm/xe/xe_pt.h           |   5 +
 drivers/gpu/drm/xe/xe_svm.c          | 324 +++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm.h          | 115 +++++++++
 drivers/gpu/drm/xe/xe_svm_devmem.c   | 232 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm_doc.h      | 121 ++++++++++
 drivers/gpu/drm/xe/xe_svm_migrate.c  | 345 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm_range.c    | 227 ++++++++++++++++++
 drivers/gpu/drm/xe/xe_trace.h        |  71 +++++-
 drivers/gpu/drm/xe/xe_vm.c           |   7 +
 drivers/gpu/drm/xe/xe_vm_types.h     |  12 +
 21 files changed, 1904 insertions(+), 25 deletions(-)
 create mode 100644 Documentation/gpu/xe/xe_svm.rst
 create mode 100644 drivers/gpu/drm/xe/xe_svm.c
 create mode 100644 drivers/gpu/drm/xe/xe_svm.h
 create mode 100644 drivers/gpu/drm/xe/xe_svm_devmem.c
 create mode 100644 drivers/gpu/drm/xe/xe_svm_doc.h
 create mode 100644 drivers/gpu/drm/xe/xe_svm_migrate.c
 create mode 100644 drivers/gpu/drm/xe/xe_svm_range.c

-- 
2.26.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 01/22] drm/xe/svm: Add SVM document
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
@ 2023-12-21  4:37 ` Oak Zeng
  2023-12-21  4:37 ` [PATCH 02/22] drm/xe/svm: Add svm key data structures Oak Zeng
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Add shared virtual memory document.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 Documentation/gpu/xe/index.rst  |   1 +
 Documentation/gpu/xe/xe_svm.rst |   8 +++
 drivers/gpu/drm/xe/xe_svm_doc.h | 121 ++++++++++++++++++++++++++++++++
 3 files changed, 130 insertions(+)
 create mode 100644 Documentation/gpu/xe/xe_svm.rst
 create mode 100644 drivers/gpu/drm/xe/xe_svm_doc.h

diff --git a/Documentation/gpu/xe/index.rst b/Documentation/gpu/xe/index.rst
index c224ecaee81e..106b60aba1f0 100644
--- a/Documentation/gpu/xe/index.rst
+++ b/Documentation/gpu/xe/index.rst
@@ -23,3 +23,4 @@ DG2, etc is provided to prototype the driver.
    xe_firmware
    xe_tile
    xe_debugging
+   xe_svm
diff --git a/Documentation/gpu/xe/xe_svm.rst b/Documentation/gpu/xe/xe_svm.rst
new file mode 100644
index 000000000000..62954ba1c6f8
--- /dev/null
+++ b/Documentation/gpu/xe/xe_svm.rst
@@ -0,0 +1,8 @@
+.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
+
+=============
+Shared virtual memory
+=============
+
+.. kernel-doc:: drivers/gpu/drm/xe/xe_svm_doc.h
+   :doc: Shared virtual memory
diff --git a/drivers/gpu/drm/xe/xe_svm_doc.h b/drivers/gpu/drm/xe/xe_svm_doc.h
new file mode 100644
index 000000000000..de38ee3585e4
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_svm_doc.h
@@ -0,0 +1,121 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef _XE_SVM_DOC_H_
+#define _XE_SVM_DOC_H_
+
+/**
+ * DOC: Shared virtual memory
+ *
+ * Shared Virtual Memory (SVM) allows the programmer to use a single virtual
+ * address space shared between threads executing on CPUs and GPUs. It abstracts
+ * away from the user the location of the backing memory, and hence simplifies
+ * the user programming model. In a non-SVM memory model, user need to explicitly
+ * decide memory placement such as device or system memory, also user need to
+ * explicitly migrate memory b/t device and system memory.
+ *
+ * Interface
+ * =========
+ *
+ * SVM makes use of default OS memory allocation and mapping interface such as
+ * malloc() and mmap(). The pointer returned from malloc() and mmap() can be
+ * directly used on both CPU and GPU program.
+ *
+ * SVM also provides API to set virtual address range based memory attributes
+ * such as preferred memory location, memory migration granularity, and memory
+ * atomic attributes etc. This is similar to Linux madvise API.
+ *
+ * Basic implementation
+ * ==============
+ *
+ * XeKMD implementation is based on Linux kernel Heterogeneous Memory Management
+ * (HMM) framework. HMM’s address space mirroring support allows sharing of the
+ * address space by duplicating sections of CPU page tables in the device page
+ * tables. This enables both CPU and GPU access a physical memory location using
+ * the same virtual address.
+ *
+ * Linux kernel also provides the ability to plugin device memory to the system
+ * (as a special ZONE_DEVICE type) and allocates struct page for each device memory
+ * page.
+ *
+ * HMM also provides a mechanism to migrate pages from host to device memory and
+ * vice versa.
+ *
+ * More information on HMM can be found here.
+ * https://www.kernel.org/doc/Documentation/vm/hmm.rst
+ *
+ * Unlike the non-SVM memory allocator (such as gem_create, vm_bind etc), there
+ * is no buffer object (BO, such as struct ttm_buffer_object, struct drm_gem_object),
+ * in our SVM implementation. We delibrately choose this implementation option
+ * to achieve page granularity memory placement, validation, eviction and migration.
+ *
+ * The SVM layer directly allocate device memory from drm buddy subsystem. The
+ * memory is organized as many blocks each of which has 2^n pages. SVM subsystem
+ * then mark the usage of each page using a simple bitmap. When all pages in a
+ * block are not used anymore, SVM return this block back to drm buddy subsystem.
+ *
+ * There are 3 events which can trigger SVM subsystem in actions:
+ *
+ * 1. A mmu notifier callback
+ *
+ * Since SVM need to mirror the program's CPU virtual address space from GPU side,
+ * when program's CPU address space changes, SVM need to make an identical change
+ * from GPU side. SVM/hmm use mmu interval notifier to achieve this. SVM register
+ * a mmu interval notifier call back function to core mm, and whenever a CPU side
+ * virtual address space is changed (i.e., when a virtual address range is unmapped
+ * from CPU calling munmap), the registered callback function will be called from
+ * core mm. SVM then mirror the CPU address space change from GPU side, i.e., unmap
+ * or invalidate the virtual address range from GPU page table.
+ *
+ * 2. A GPU page fault
+ *
+ * At the very beginning of a process's life, no virtual address of the process
+ * is mapped on GPU page table. So when GPU access any virtual address of the process
+ * a GPU page fault is triggered. SVM then decide the best memory location of the
+ * fault address (mainly from performance consideration. Some times also consider
+ * correctness requirement such as whether GPU can perform atomics operation to
+ * certain memory location), migrate memory if necessary, and map the fault address
+ * to GPU page table.
+ *
+ * 3. A CPU page fault
+ *
+ * A CPU page fault is usually managed by Linux core mm. But in a CPU and GPU
+ * mix programming environment, the backing store of a virtual address range
+ * can be in GPU's local memory which is not visible to CPU (DEVICE_PRIVATE),
+ * so CPU page fault handler need to migrate such pages to system memory for
+ * CPU to be able to access them. Such memory migration is device specific.
+ * HMM has a callback function (migrate_to_ram function of the dev_pagemap_ops)
+ * for device driver to implement.
+ *
+ *
+ * Memory hints: TBD
+ * =================
+ *
+ * Memory eviction: TBD
+ * ===============
+ *
+ * Lock design
+ * ===========
+ *
+ * https://www.kernel.org/doc/Documentation/vm/hmm.rst section "Address space mirroring
+ * implemenation and API" described the locking scheme that driver writer has to
+ * respect. There are 3 lock mechanism involved in this scheme:
+ *
+ * 1. Use mmp_read/write_lock to protect VMA, cpu page table operations.  Operation such
+ * as munmap/mmap, page table update during numa balance must hold this lock. Hmm_range_fault
+ * is a helper function provided by HMM to populate the CPU page table, so it must be called
+ * with this lock
+ *
+ * 2. Use xe_svm::mutex to protect device side page table operation. Any attempt to bind an
+ * address range to GPU, or invalidate an address range from GPU, should hold this device lock
+ *
+ * 3. In the GPU page fault handler, during device page table update, we hold a xe_svm::mutex,
+ * but we don't hold the mmap_read/write_lock. So programm's address space can change during
+ * the GPU page table update. mmu notifier seq# is used to determine whether unmap happened
+ * during during device page table update, if yes, then retry.
+ *
+ */
+
+#endif
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 02/22] drm/xe/svm: Add svm key data structures
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
  2023-12-21  4:37 ` [PATCH 01/22] drm/xe/svm: Add SVM document Oak Zeng
@ 2023-12-21  4:37 ` Oak Zeng
  2023-12-21  4:37 ` [PATCH 03/22] drm/xe/svm: create xe svm during vm creation Oak Zeng
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Add xe_svm and xe_svm_range data structure. Each xe_svm
represents a svm address space and it maps 1:1 to the
process's mm_struct. It also maps 1:1 to the gpu xe_vm
struct.

Each xe_svm_range represent a virtual address range inside
a svm address space. It is similar to CPU's  vm_area_struct,
or to the GPU xe_vma struct. It contains data to synchronize
this address range to CPU's virtual address range, using mmu
notifier mechanism. It can also hold this range's memory
attributes set by user, such as preferred memory location etc -
this is TBD.

Each svm address space is made of many svm virtual address range.
All address ranges are maintained in xe_svm's interval tree.

Also add a xe_svm pointer to xe_vm data structure. So we have
a 1:1 mapping b/t xe_svm and xe_vm.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.h      | 59 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm_types.h |  2 ++
 2 files changed, 61 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_svm.h

diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
new file mode 100644
index 000000000000..ba301a331f59
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -0,0 +1,59 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#ifndef __XE_SVM_H
+#define __XE_SVM_H
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/mmu_notifier.h>
+#include <linux/workqueue.h>
+#include <linux/rbtree_types.h>
+#include <linux/interval_tree.h>
+
+struct xe_vm;
+struct mm_struct;
+
+/**
+ * struct xe_svm - data structure to represent a shared
+ * virtual address space from device side. xe_svm, xe_vm
+ * and mm_struct has a 1:1:1 relationship.
+ */
+struct xe_svm {
+	/** @vm: The xe_vm address space corresponding to this xe_svm */
+	struct xe_vm *vm;
+	/** @mm: The mm_struct corresponding to this xe_svm */
+	struct mm_struct *mm;
+	/**
+	 * @mutex: A lock used by svm subsystem. It protects:
+	 * 1. below range_tree
+	 * 2. GPU page table update. Serialize all SVM GPU page table updates
+	 */
+	struct mutex mutex;
+	/**
+	 * @range_tree: Interval tree of all svm ranges in this svm
+	 */
+	struct rb_root_cached range_tree;
+};
+
+/**
+ * struct xe_svm_range - Represents a shared virtual address range.
+ */
+struct xe_svm_range {
+	/** @notifier: The mmu interval notifer used to keep track of CPU
+	 * side address range change. Driver will get a callback with this
+	 * notifier if anything changed from CPU side, such as range is
+	 * unmapped from CPU
+	 */
+	struct mmu_interval_notifier notifier;
+	/** @start: start address of this range, inclusive */
+	u64 start;
+	/** @end: end address of this range, exclusive */
+	u64 end;
+	/** @unregister_notifier_work: A worker used to unregister this notifier */
+	struct work_struct unregister_notifier_work;
+	/** @inode: used to link this range to svm's range_tree */
+	struct interval_tree_node inode;
+};
+#endif
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 63e8a50b88e9..037fb7168c63 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -17,6 +17,7 @@
 #include "xe_pt_types.h"
 #include "xe_range_fence.h"
 
+struct xe_svm;
 struct xe_bo;
 struct xe_sync_entry;
 struct xe_vm;
@@ -279,6 +280,7 @@ struct xe_vm {
 	bool batch_invalidate_tlb;
 	/** @xef: XE file handle for tracking this VM's drm client */
 	struct xe_file *xef;
+	struct xe_svm *svm;
 };
 
 /** struct xe_vma_op_map - VMA map operation */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 03/22] drm/xe/svm: create xe svm during vm creation
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
  2023-12-21  4:37 ` [PATCH 01/22] drm/xe/svm: Add SVM document Oak Zeng
  2023-12-21  4:37 ` [PATCH 02/22] drm/xe/svm: Add svm key data structures Oak Zeng
@ 2023-12-21  4:37 ` Oak Zeng
  2023-12-21  4:37 ` [PATCH 04/22] drm/xe/svm: Trace svm creation Oak Zeng
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Create the xe_svm struct during xe_vm creation.
Add xe_svm to a global hash table so later on
we can retrieve xe_svm using mm_struct (the key).

Destroy svm process during xe_vm close.

Also add a helper function to retrieve svm struct
from mm struct

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c | 63 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm.h | 11 +++++++
 drivers/gpu/drm/xe/xe_vm.c  |  5 +++
 3 files changed, 79 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_svm.c

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
new file mode 100644
index 000000000000..559188471949
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -0,0 +1,63 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/mutex.h>
+#include <linux/mm_types.h>
+#include "xe_svm.h"
+
+DEFINE_HASHTABLE(xe_svm_table, XE_MAX_SVM_PROCESS);
+
+/**
+ * xe_destroy_svm() - destroy a svm process
+ *
+ * @svm: the xe_svm to destroy
+ */
+void xe_destroy_svm(struct xe_svm *svm)
+{
+	hash_del_rcu(&svm->hnode);
+	mutex_destroy(&svm->mutex);
+	kfree(svm);
+}
+
+/**
+ * xe_create_svm() - create a svm process
+ *
+ * @vm: the xe_vm that we create svm process for
+ *
+ * Return the created xe svm struct
+ */
+struct xe_svm *xe_create_svm(struct xe_vm *vm)
+{
+	struct mm_struct *mm = current->mm;
+	struct xe_svm *svm;
+
+	svm = kzalloc(sizeof(struct xe_svm), GFP_KERNEL);
+	svm->mm = mm;
+	svm->vm	= vm;
+	mutex_init(&svm->mutex);
+	/** Add svm to global xe_svm_table hash table
+	 *  use mm as key so later we can retrieve svm using mm
+	 */
+	hash_add_rcu(xe_svm_table, &svm->hnode, (uintptr_t)mm);
+	return svm;
+}
+
+/**
+ * xe_lookup_svm_by_mm() - retrieve xe_svm from mm struct
+ *
+ * @mm: the mm struct of the svm to retrieve
+ *
+ * Return the xe_svm struct pointer, or NULL if fail
+ */
+struct xe_svm *xe_lookup_svm_by_mm(struct mm_struct *mm)
+{
+	struct xe_svm *svm;
+
+	hash_for_each_possible_rcu(xe_svm_table, svm, hnode, (uintptr_t)mm)
+		if (svm->mm == mm)
+			return svm;
+
+	return NULL;
+}
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index ba301a331f59..cd3cf92f3784 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -11,10 +11,15 @@
 #include <linux/workqueue.h>
 #include <linux/rbtree_types.h>
 #include <linux/interval_tree.h>
+#include <linux/hashtable.h>
+#include <linux/types.h>
 
 struct xe_vm;
 struct mm_struct;
 
+#define XE_MAX_SVM_PROCESS 5 /* Maximumly support 32 SVM process*/
+extern DECLARE_HASHTABLE(xe_svm_table, XE_MAX_SVM_PROCESS);
+
 /**
  * struct xe_svm - data structure to represent a shared
  * virtual address space from device side. xe_svm, xe_vm
@@ -35,6 +40,8 @@ struct xe_svm {
 	 * @range_tree: Interval tree of all svm ranges in this svm
 	 */
 	struct rb_root_cached range_tree;
+	/** @hnode: used to add this svm to a global xe_svm_hash table*/
+	struct hlist_node hnode;
 };
 
 /**
@@ -56,4 +63,8 @@ struct xe_svm_range {
 	/** @inode: used to link this range to svm's range_tree */
 	struct interval_tree_node inode;
 };
+
+void xe_destroy_svm(struct xe_svm *svm);
+struct xe_svm *xe_create_svm(struct xe_vm *vm);
+struct xe_svm *xe_lookup_svm_by_mm(struct mm_struct *mm);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 1ca917b8315c..3c301a5c7325 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -36,6 +36,7 @@
 #include "xe_trace.h"
 #include "generated/xe_wa_oob.h"
 #include "xe_wa.h"
+#include "xe_svm.h"
 
 #define TEST_VM_ASYNC_OPS_ERROR
 
@@ -1375,6 +1376,7 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 		xe->usm.num_vm_in_non_fault_mode++;
 	mutex_unlock(&xe->usm.lock);
 
+	vm->svm = xe_create_svm(vm);
 	trace_xe_vm_create(vm);
 
 	return vm;
@@ -1495,6 +1497,9 @@ void xe_vm_close_and_put(struct xe_vm *vm)
 	for_each_tile(tile, xe, id)
 		xe_range_fence_tree_fini(&vm->rftree[id]);
 
+	if (vm->svm)
+		xe_destroy_svm(vm->svm);
+
 	xe_vm_put(vm);
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 04/22] drm/xe/svm: Trace svm creation
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (2 preceding siblings ...)
  2023-12-21  4:37 ` [PATCH 03/22] drm/xe/svm: create xe svm during vm creation Oak Zeng
@ 2023-12-21  4:37 ` Oak Zeng
  2023-12-21  4:37 ` [PATCH 05/22] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

xe_vm tracepoint is extended to also print svm.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_trace.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 95163c303f3e..63867c0fa848 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -467,15 +467,17 @@ DECLARE_EVENT_CLASS(xe_vm,
 		    TP_STRUCT__entry(
 			     __field(u64, vm)
 			     __field(u32, asid)
+			     __field(u64, svm)
 			     ),
 
 		    TP_fast_assign(
 			   __entry->vm = (unsigned long)vm;
 			   __entry->asid = vm->usm.asid;
+			   __entry->svm = (unsigned long)vm->svm;
 			   ),
 
-		    TP_printk("vm=0x%016llx, asid=0x%05x",  __entry->vm,
-			      __entry->asid)
+		    TP_printk("vm=0x%016llx, asid=0x%05x, svm=0x%016llx",  __entry->vm,
+			      __entry->asid, __entry->svm)
 );
 
 DEFINE_EVENT(xe_vm, xe_vm_kill,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 05/22] drm/xe/svm: add helper to retrieve svm range from address
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (3 preceding siblings ...)
  2023-12-21  4:37 ` [PATCH 04/22] drm/xe/svm: Trace svm creation Oak Zeng
@ 2023-12-21  4:37 ` Oak Zeng
  2023-12-21  4:37 ` [PATCH 06/22] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

All valid virtual address range are maintained in svm's
range_tree. This functions iterate svm's range tree and
return the svm range that contains specific address.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.h       |  2 ++
 drivers/gpu/drm/xe/xe_svm_range.c | 32 +++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_svm_range.c

diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index cd3cf92f3784..3ed106ecc02b 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -67,4 +67,6 @@ struct xe_svm_range {
 void xe_destroy_svm(struct xe_svm *svm);
 struct xe_svm *xe_create_svm(struct xe_vm *vm);
 struct xe_svm *xe_lookup_svm_by_mm(struct mm_struct *mm);
+struct xe_svm_range *xe_svm_range_from_addr(struct xe_svm *svm,
+								unsigned long addr);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_svm_range.c b/drivers/gpu/drm/xe/xe_svm_range.c
new file mode 100644
index 000000000000..d8251d38f65e
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_svm_range.c
@@ -0,0 +1,32 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/interval_tree.h>
+#include <linux/container_of.h>
+#include <linux/mutex.h>
+#include "xe_svm.h"
+
+/**
+ * xe_svm_range_from_addr() - retrieve svm_range contains a virtual address
+ *
+ * @svm: svm that the virtual address belongs to
+ * @addr: the virtual address to retrieve svm_range for
+ *
+ * return the svm range found,
+ * or NULL if no range found
+ */
+struct xe_svm_range *xe_svm_range_from_addr(struct xe_svm *svm,
+									unsigned long addr)
+{
+	struct interval_tree_node *node;
+
+	mutex_lock(&svm->mutex);
+	node = interval_tree_iter_first(&svm->range_tree, addr, addr);
+	mutex_unlock(&svm->mutex);
+	if (!node)
+		return NULL;
+
+	return container_of(node, struct xe_svm_range, inode);
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 06/22] drm/xe/svm: Introduce a helper to build sg table from hmm range
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (4 preceding siblings ...)
  2023-12-21  4:37 ` [PATCH 05/22] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
@ 2023-12-21  4:37 ` Oak Zeng
  2023-12-21  4:37 ` [PATCH 07/22] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Introduce xe_svm_build_sg helper function to build a scatter
gather table from a hmm_range struct. This is prepare work
for binding hmm range to gpu.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c | 52 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm.h |  3 +++
 2 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 559188471949..ab3cc2121869 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -6,6 +6,8 @@
 #include <linux/mutex.h>
 #include <linux/mm_types.h>
 #include "xe_svm.h"
+#include <linux/hmm.h>
+#include <linux/scatterlist.h>
 
 DEFINE_HASHTABLE(xe_svm_table, XE_MAX_SVM_PROCESS);
 
@@ -61,3 +63,53 @@ struct xe_svm *xe_lookup_svm_by_mm(struct mm_struct *mm)
 
 	return NULL;
 }
+
+/**
+ * xe_svm_build_sg() - build a scatter gather table for all the physical pages/pfn
+ * in a hmm_range.
+ *
+ * @range: the hmm range that we build the sg table from. range->hmm_pfns[]
+ * has the pfn numbers of pages that back up this hmm address range.
+ * @st: pointer to the sg table.
+ *
+ * All the contiguous pfns will be collapsed into one entry in
+ * the scatter gather table. This is for the convenience of
+ * later on operations to bind address range to GPU page table.
+ *
+ * This function allocates the storage of the sg table. It is
+ * caller's responsibility to free it calling sg_free_table.
+ *
+ * Returns 0 if successful; -ENOMEM if fails to allocate memory
+ */
+int xe_svm_build_sg(struct hmm_range *range,
+			     struct sg_table *st)
+{
+	struct scatterlist *sg;
+	u64 i, npages;
+
+	sg = NULL;
+	st->nents = 0;
+	npages = ((range->end - 1) >> PAGE_SHIFT) - (range->start >> PAGE_SHIFT) + 1;
+
+	if (unlikely(sg_alloc_table(st, npages, GFP_KERNEL)))
+		return -ENOMEM;
+
+	for (i = 0; i < npages; i++) {
+		unsigned long addr = range->hmm_pfns[i];
+
+		if (sg && (addr == (sg_dma_address(sg) + sg->length))) {
+			sg->length += PAGE_SIZE;
+			sg_dma_len(sg) += PAGE_SIZE;
+			continue;
+		}
+
+		sg =  sg ? sg_next(sg) : st->sgl;
+		sg_dma_address(sg) = addr;
+		sg_dma_len(sg) = PAGE_SIZE;
+		sg->length = PAGE_SIZE;
+		st->nents++;
+	}
+
+	sg_mark_end(sg);
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 3ed106ecc02b..191bce6425db 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -13,6 +13,8 @@
 #include <linux/interval_tree.h>
 #include <linux/hashtable.h>
 #include <linux/types.h>
+#include <linux/hmm.h>
+#include "xe_device_types.h"
 
 struct xe_vm;
 struct mm_struct;
@@ -69,4 +71,5 @@ struct xe_svm *xe_create_svm(struct xe_vm *vm);
 struct xe_svm *xe_lookup_svm_by_mm(struct mm_struct *mm);
 struct xe_svm_range *xe_svm_range_from_addr(struct xe_svm *svm,
 								unsigned long addr);
+int xe_svm_build_sg(struct hmm_range *range, struct sg_table *st);
 #endif
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 07/22] drm/xe/svm: Add helper for binding hmm range to gpu
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (5 preceding siblings ...)
  2023-12-21  4:37 ` [PATCH 06/22] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
@ 2023-12-21  4:37 ` Oak Zeng
  2023-12-21  4:37 ` [PATCH 08/22] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Add helper function xe_bind_svm_range to bind a svm range
to gpu. A temporary xe_vma is created locally to re-use
existing page table update functions which are vma-based.

The svm page table update lock design is different from
userptr and bo page table update. A xe_pt_svm_pre_commit
function is introduced for svm range pre-commitment.

A hmm_range pointer is added to xe_vma struct.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c       | 101 ++++++++++++++++++++++++++++++-
 drivers/gpu/drm/xe/xe_pt.h       |   4 ++
 drivers/gpu/drm/xe/xe_vm_types.h |  10 +++
 3 files changed, 113 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index de1030a47588..65cfac88ab2f 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -17,6 +17,7 @@
 #include "xe_trace.h"
 #include "xe_ttm_stolen_mgr.h"
 #include "xe_vm.h"
+#include "xe_svm.h"
 
 struct xe_pt_dir {
 	struct xe_pt pt;
@@ -617,7 +618,10 @@ xe_pt_stage_bind(struct xe_tile *tile, struct xe_vma *vma,
 	xe_bo_assert_held(bo);
 
 	if (!xe_vma_is_null(vma)) {
-		if (xe_vma_is_userptr(vma))
+		if (vma->svm_sg)
+			xe_res_first_sg(vma->svm_sg, 0, xe_vma_size(vma),
+					&curs);
+		else if (xe_vma_is_userptr(vma))
 			xe_res_first_sg(vma->userptr.sg, 0, xe_vma_size(vma),
 					&curs);
 		else if (xe_bo_is_vram(bo) || xe_bo_is_stolen(bo))
@@ -1046,6 +1050,28 @@ static int xe_pt_userptr_pre_commit(struct xe_migrate_pt_update *pt_update)
 	return 0;
 }
 
+static int xe_pt_svm_pre_commit(struct xe_migrate_pt_update *pt_update)
+{
+	struct xe_vma *vma = pt_update->vma;
+	struct hmm_range *range = vma->hmm_range;
+
+	if (mmu_interval_read_retry(range->notifier,
+		    range->notifier_seq)) {
+		/*
+		 * FIXME: is this really necessary? We didn't update GPU
+		 * page table yet...
+		 */
+		xe_vm_invalidate_vma(vma);
+		return -EAGAIN;
+	}
+	return 0;
+}
+
+static const struct xe_migrate_pt_update_ops svm_bind_ops = {
+	.populate = xe_vm_populate_pgtable,
+	.pre_commit = xe_pt_svm_pre_commit,
+};
+
 static const struct xe_migrate_pt_update_ops bind_ops = {
 	.populate = xe_vm_populate_pgtable,
 	.pre_commit = xe_pt_pre_commit,
@@ -1197,7 +1223,8 @@ __xe_pt_bind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queue
 	struct xe_vm_pgtable_update entries[XE_VM_MAX_LEVEL * 2 + 1];
 	struct xe_pt_migrate_pt_update bind_pt_update = {
 		.base = {
-			.ops = xe_vma_is_userptr(vma) ? &userptr_bind_ops : &bind_ops,
+			.ops = vma->svm_sg ? &svm_bind_ops :
+					(xe_vma_is_userptr(vma) ? &userptr_bind_ops : &bind_ops),
 			.vma = vma,
 			.tile_id = tile->id,
 		},
@@ -1651,3 +1678,73 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
 
 	return fence;
 }
+
+/**
+ * xe_bind_svm_range() - bind an address range to vm
+ *
+ * @vm: the vm to bind this address range
+ * @tile: the tile to bind this address range to
+ * @range: a hmm_range which includes all the information
+ * needed for binding: virtual address range and physical
+ * pfns to back up this virtual address range.
+ * @flags: the binding flags to set in pte
+ *
+ * This is a helper function used by svm sub-system
+ * to bind a svm range to gpu vm. svm sub-system
+ * doesn't have xe_vma, thus helpers such as
+ * __xe_pt_bind_vma can't be used directly. So this
+ * helper is written for svm sub-system to use.
+ *
+ * This is a synchronous function. When this function
+ * returns, either the svm range is bound to GPU, or
+ * error happened.
+ *
+ * Return: 0 for success or error code for failure
+ * If -EAGAIN returns, it means mmu notifier was called (
+ * aka there was concurrent cpu page table update) during
+ * this function, caller has to retry hmm_range_fault
+ */
+int xe_bind_svm_range(struct xe_vm *vm, struct xe_tile *tile,
+			struct hmm_range *range, u64 flags)
+{
+	struct dma_fence *fence = NULL;
+	struct xe_svm *svm = vm->svm;
+	int ret = 0;
+	/*
+	 * Create a temp vma to reuse page table helpers such as
+	 * __xe_pt_bind_vma
+	 */
+	struct xe_vma vma = {
+		.gpuva = {
+			.va = {
+				.addr = range->start,
+				.range = range->end - range->start + 1,
+			},
+			.vm = &vm->gpuvm,
+			.flags = flags,
+		},
+		.tile_mask = 0x1 << tile->id,
+		.hmm_range = range,
+	};
+
+	xe_svm_build_sg(range, &vma.svm_sgt);
+	vma.svm_sg = &vma.svm_sgt;
+
+	mutex_lock(&svm->mutex);
+	if (mmu_interval_read_retry(range->notifier, range->notifier_seq)) {
+		ret = -EAGAIN;
+		goto unlock;
+	}
+	fence = __xe_pt_bind_vma(tile, &vma, vm->q[tile->id], NULL, 0, false);
+
+unlock:
+	mutex_unlock(&svm->mutex);
+	sg_free_table(vma.svm_sg);
+
+	if (IS_ERR(fence))
+		return PTR_ERR(fence);
+
+	dma_fence_wait(fence, false);
+	dma_fence_put(fence);
+	return ret;
+}
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 71a4fbfcff43..775d08707466 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -17,6 +17,8 @@ struct xe_sync_entry;
 struct xe_tile;
 struct xe_vm;
 struct xe_vma;
+struct xe_svm;
+struct hmm_range;
 
 /* Largest huge pte is currently 1GiB. May become device dependent. */
 #define MAX_HUGEPTE_LEVEL 2
@@ -45,4 +47,6 @@ __xe_pt_unbind_vma(struct xe_tile *tile, struct xe_vma *vma, struct xe_exec_queu
 
 bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
 
+int xe_bind_svm_range(struct xe_vm *vm, struct xe_tile *tile,
+			struct hmm_range *range, u64 flags);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_vm_types.h b/drivers/gpu/drm/xe/xe_vm_types.h
index 037fb7168c63..deefe2364667 100644
--- a/drivers/gpu/drm/xe/xe_vm_types.h
+++ b/drivers/gpu/drm/xe/xe_vm_types.h
@@ -21,6 +21,7 @@ struct xe_svm;
 struct xe_bo;
 struct xe_sync_entry;
 struct xe_vm;
+struct hmm_range;
 
 #define TEST_VM_ASYNC_OPS_ERROR
 #define FORCE_ASYNC_OP_ERROR	BIT(31)
@@ -112,6 +113,15 @@ struct xe_vma {
 	 * user pointers
 	 */
 	struct xe_userptr userptr;
+
+	/**
+	 * @svm_sgt: a scatter gather table to save svm virtual address range's
+	 * pfns
+	 */
+	struct sg_table svm_sgt;
+	struct sg_table *svm_sg;
+	/** hmm range of this pt update, used by svm */
+	struct hmm_range *hmm_range;
 };
 
 struct xe_device;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 08/22] drm/xe/svm: Add helper to invalidate svm range from GPU
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (6 preceding siblings ...)
  2023-12-21  4:37 ` [PATCH 07/22] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
@ 2023-12-21  4:37 ` Oak Zeng
  2023-12-21  4:37 ` [PATCH 09/22] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

A svm subsystem friendly function is added for svm range invalidation
purpose. svm subsystem doesn't maintain xe_vma, so a temporary xe_vma
is used to call function xe_vma_invalidate_vma

Not sure whether this works or not. Will have to test. if a temporary
vma doesn't work, we will have to call the zap_pte/tlb_inv functions
directly.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_pt.c | 33 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_pt.h |  1 +
 2 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c
index 65cfac88ab2f..9805b402ebca 100644
--- a/drivers/gpu/drm/xe/xe_pt.c
+++ b/drivers/gpu/drm/xe/xe_pt.c
@@ -1748,3 +1748,36 @@ int xe_bind_svm_range(struct xe_vm *vm, struct xe_tile *tile,
 	dma_fence_put(fence);
 	return ret;
 }
+
+/**
+ * xe_invalidate_svm_range() - a helper to invalidate a svm address range
+ *
+ * @vm: The vm that the address range belongs to
+ * @start: start of the virtual address range
+ * @size: size of the virtual address range
+ *
+ * This is a helper function supposed to be used by svm subsystem.
+ * svm subsystem doesn't maintain xe_vma, so we create a temporary
+ * xe_vma structure so we can reuse xe_vm_invalidate_vma().
+ */
+void xe_invalidate_svm_range(struct xe_vm *vm, u64 start, u64 size)
+{
+	struct xe_vma vma = {
+		.gpuva = {
+			.va = {
+				.addr = start,
+				.range = size,
+			},
+			.vm = &vm->gpuvm,
+		},
+		/** invalidate from all tiles
+		 *  FIXME: We used temporary vma in xe_bind_svm_range, so
+		 *  we lost track of which tile we are bound to. Does
+		 *  setting tile_present to all tiles cause a problem
+		 *  in xe_vm_invalidate_vma()?
+		 */
+		.tile_present = BIT(vm->xe->info.tile_count) - 1,
+	};
+
+	xe_vm_invalidate_vma(&vma);
+}
diff --git a/drivers/gpu/drm/xe/xe_pt.h b/drivers/gpu/drm/xe/xe_pt.h
index 775d08707466..42d495997635 100644
--- a/drivers/gpu/drm/xe/xe_pt.h
+++ b/drivers/gpu/drm/xe/xe_pt.h
@@ -49,4 +49,5 @@ bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma);
 
 int xe_bind_svm_range(struct xe_vm *vm, struct xe_tile *tile,
 			struct hmm_range *range, u64 flags);
+void xe_invalidate_svm_range(struct xe_vm *vm, u64 start, u64 size);
 #endif
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 09/22] drm/xe/svm: Remap and provide memmap backing for GPU vram
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (7 preceding siblings ...)
  2023-12-21  4:37 ` [PATCH 08/22] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
@ 2023-12-21  4:37 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 10/22] drm/xe/svm: Introduce svm migration function Oak Zeng
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:37 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Memory remap GPU vram using devm_memremap_pages, so each GPU vram
page is backed by a struct page.

Those struct pages are created to allow hmm migrate buffer b/t
GPU vram and CPU system memory using existing Linux migration
mechanism (i.e., migrating b/t CPU system memory and hard disk).

This is prepare work to enable svm (shared virtual memory) through
Linux kernel hmm framework. The memory remap's page map type is set
to MEMORY_DEVICE_PRIVATE for now. This means even though each GPU
vram page get a struct page and can be mapped in CPU page table,
but such pages are treated as GPU's private resource, so CPU can't
access them. If CPU access such page, a page fault is triggered
and page will be migrate to system memory.

For GPU device which supports coherent memory protocol b/t CPU and
GPU (such as CXL and CAPI protocol), we can remap device memory as
MEMORY_DEVICE_COHERENT. This is TBD.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  8 +++
 drivers/gpu/drm/xe/xe_mmio.c         |  7 +++
 drivers/gpu/drm/xe/xe_svm.h          |  2 +
 drivers/gpu/drm/xe/xe_svm_devmem.c   | 87 ++++++++++++++++++++++++++++
 4 files changed, 104 insertions(+)
 create mode 100644 drivers/gpu/drm/xe/xe_svm_devmem.c

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 71f23ac365e6..c67c28f04d2f 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -99,6 +99,14 @@ struct xe_mem_region {
 	resource_size_t actual_physical_size;
 	/** @mapping: pointer to VRAM mappable space */
 	void *__iomem mapping;
+	/** @pagemap: Used to remap device memory as ZONE_DEVICE */
+	struct dev_pagemap pagemap;
+	/**
+	 * @hpa_base: base host physical address
+	 *
+	 * This is generated when remap device memory as ZONE_DEVICE
+	 */
+	resource_size_t hpa_base;
 };
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index f660cfb79f50..cfe25a3c7059 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -21,6 +21,7 @@
 #include "xe_macros.h"
 #include "xe_module.h"
 #include "xe_tile.h"
+#include "xe_svm.h"
 
 #define XEHP_MTCFG_ADDR		XE_REG(0x101800)
 #define TILE_COUNT		REG_GENMASK(15, 8)
@@ -285,6 +286,7 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 		}
 
 		io_size -= min_t(u64, tile_size, io_size);
+		xe_svm_devm_add(tile, &tile->mem.vram);
 	}
 
 	xe->mem.vram.actual_physical_size = total_size;
@@ -353,10 +355,15 @@ void xe_mmio_probe_tiles(struct xe_device *xe)
 static void mmio_fini(struct drm_device *drm, void *arg)
 {
 	struct xe_device *xe = arg;
+	struct xe_tile *tile;
+	u8 id;
 
 	pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
 	if (xe->mem.vram.mapping)
 		iounmap(xe->mem.vram.mapping);
+	for_each_tile(tile, xe, id) {
+		xe_svm_devm_remove(xe, &tile->mem.vram);
+	}
 }
 
 static int xe_verify_lmem_ready(struct xe_device *xe)
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 191bce6425db..b54f7714a1fc 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -72,4 +72,6 @@ struct xe_svm *xe_lookup_svm_by_mm(struct mm_struct *mm);
 struct xe_svm_range *xe_svm_range_from_addr(struct xe_svm *svm,
 								unsigned long addr);
 int xe_svm_build_sg(struct hmm_range *range, struct sg_table *st);
+int xe_svm_devm_add(struct xe_tile *tile, struct xe_mem_region *mem);
+void xe_svm_devm_remove(struct xe_device *xe, struct xe_mem_region *mem);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_svm_devmem.c b/drivers/gpu/drm/xe/xe_svm_devmem.c
new file mode 100644
index 000000000000..cf7882830247
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_svm_devmem.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/mm_types.h>
+#include <linux/sched/mm.h>
+
+#include "xe_device_types.h"
+#include "xe_trace.h"
+
+
+static vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf)
+{
+	return 0;
+}
+
+static void xe_devm_page_free(struct page *page)
+{
+}
+
+static const struct dev_pagemap_ops xe_devm_pagemap_ops = {
+	.page_free = xe_devm_page_free,
+	.migrate_to_ram = xe_devm_migrate_to_ram,
+};
+
+/**
+ * xe_svm_devm_add: Remap and provide memmap backing for device memory
+ * @tile: tile that the memory region blongs to
+ * @mr: memory region to remap
+ *
+ * This remap device memory to host physical address space and create
+ * struct page to back device memory
+ *
+ * Return: 0 on success standard error code otherwise
+ */
+int xe_svm_devm_add(struct xe_tile *tile, struct xe_mem_region *mr)
+{
+	struct device *dev = &to_pci_dev(tile->xe->drm.dev)->dev;
+	struct resource *res;
+	void *addr;
+	int ret;
+
+	res = devm_request_free_mem_region(dev, &iomem_resource,
+					   mr->usable_size);
+	if (IS_ERR(res)) {
+		ret = PTR_ERR(res);
+		return ret;
+	}
+
+	mr->pagemap.type = MEMORY_DEVICE_PRIVATE;
+	mr->pagemap.range.start = res->start;
+	mr->pagemap.range.end = res->end;
+	mr->pagemap.nr_range = 1;
+	mr->pagemap.ops = &xe_devm_pagemap_ops;
+	mr->pagemap.owner = tile->xe->drm.dev;
+	addr = devm_memremap_pages(dev, &mr->pagemap);
+	if (IS_ERR(addr)) {
+		devm_release_mem_region(dev, res->start, resource_size(res));
+		ret = PTR_ERR(addr);
+		drm_err(&tile->xe->drm, "Failed to remap tile %d memory, errno %d\n",
+				tile->id, ret);
+		return ret;
+	}
+	mr->hpa_base = res->start;
+
+	drm_info(&tile->xe->drm, "Added tile %d memory [%llx-%llx] to devm, remapped to %pr\n",
+			tile->id, mr->io_start, mr->io_start + mr->usable_size, res);
+	return 0;
+}
+
+/**
+ * xe_svm_devm_remove: Unmap device memory and free resources
+ * @xe: xe device
+ * @mr: memory region to remove
+ */
+void xe_svm_devm_remove(struct xe_device *xe, struct xe_mem_region *mr)
+{
+	struct device *dev = &to_pci_dev(xe->drm.dev)->dev;
+
+	if (mr->hpa_base) {
+		devm_memunmap_pages(dev, &mr->pagemap);
+		devm_release_mem_region(dev, mr->pagemap.range.start,
+			mr->pagemap.range.end - mr->pagemap.range.start +1);
+	}
+}
+
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 10/22] drm/xe/svm: Introduce svm migration function
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (8 preceding siblings ...)
  2023-12-21  4:37 ` [PATCH 09/22] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 11/22] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Introduce xe_migrate_svm function for data migration.
This function is similar to xe_migrate_copy function
but has different parameters. Instead of BO and ttm
resource parameters, it has source and destination
buffer's dpa address as parameter. This function is
intended to be used by svm sub-system which doesn't
have BO and TTM concept.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_migrate.c | 213 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_migrate.h |   7 ++
 2 files changed, 220 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index adf1dab5eba2..425de8e44deb 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -387,6 +387,37 @@ static u64 xe_migrate_res_sizes(struct xe_device *xe, struct xe_res_cursor *cur)
 		     cur->remaining);
 }
 
+/**
+ * pte_update_cmd_size() - calculate the batch buffer command size
+ * to update a flat page table.
+ *
+ * @size: The virtual address range size of the page table to update
+ *
+ * The page table to update is supposed to be a flat 1 level page
+ * table with all entries pointing to 4k pages.
+ *
+ * Return the number of dwords of the update command
+ */
+static u32 pte_update_cmd_size(u64 size)
+{
+	u32 dword;
+	u64 entries = DIV_ROUND_UP(size, XE_PAGE_SIZE);
+
+	XE_WARN_ON(size > MAX_PREEMPTDISABLE_TRANSFER);
+	/*
+	 * MI_STORE_DATA_IMM command is used to update page table. Each
+	 * instruction can update maximumly 0x1ff pte entries. To update
+	 * n (n <= 0x1ff) pte entries, we need:
+	 * 1 dword for the MI_STORE_DATA_IMM command header (opcode etc)
+	 * 2 dword for the page table's physical location
+	 * 2*n dword for value of pte to fill (each pte entry is 2 dwords)
+	 */
+	dword = (1 + 2) * DIV_ROUND_UP(entries, 0x1ff);
+	dword += entries * 2;
+
+	return dword;
+}
+
 static u32 pte_update_size(struct xe_migrate *m,
 			   bool is_vram,
 			   struct ttm_resource *res,
@@ -492,6 +523,48 @@ static void emit_pte(struct xe_migrate *m,
 	}
 }
 
+/**
+ * build_pt_update_batch_sram() - build batch buffer commands to update
+ * migration vm page table for system memory
+ *
+ * @m: The migration context
+ * @bb: The batch buffer which hold the page table update commands
+ * @pt_offset: The offset of page table to update, in byte
+ * @dpa: device physical address you want the page table to point to
+ * @size: size of the virtual address space you want the page table to cover
+ */
+static void build_pt_update_batch_sram(struct xe_migrate *m,
+		     struct xe_bb *bb, u32 pt_offset,
+		     u64 dpa, u32 size)
+{
+	u16 pat_index = tile_to_xe(m->tile)->pat.idx[XE_CACHE_WB];
+	u32 ptes;
+
+	ptes = DIV_ROUND_UP(size, XE_PAGE_SIZE);
+	while (ptes) {
+		u32 chunk = min(0x1ffU, ptes);
+
+		bb->cs[bb->len++] = MI_STORE_DATA_IMM | MI_SDI_NUM_QW(chunk);
+		bb->cs[bb->len++] = pt_offset;
+		bb->cs[bb->len++] = 0;
+
+		pt_offset += chunk * 8;
+		ptes -= chunk;
+
+		while (chunk--) {
+			u64 addr;
+
+			addr = dpa & PAGE_MASK;
+			addr = m->q->vm->pt_ops->pte_encode_addr(m->tile->xe,
+								 addr, pat_index,
+								 0, false, 0);
+			bb->cs[bb->len++] = lower_32_bits(addr);
+			bb->cs[bb->len++] = upper_32_bits(addr);
+			dpa += XE_PAGE_SIZE;
+		}
+	}
+}
+
 #define EMIT_COPY_CCS_DW 5
 static void emit_copy_ccs(struct xe_gt *gt, struct xe_bb *bb,
 			  u64 dst_ofs, bool dst_is_indirect,
@@ -808,6 +881,146 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
 	return fence;
 }
 
+/**
+ * xe_migrate_svm() - A migrate function used by SVM subsystem
+ *
+ * @m: The migration context
+ * @src_dpa: device physical start address of source, from GPU's point of view
+ * @src_is_vram: True if source buffer is in vram.
+ * @dst_dpa: device physical start address of destination, from GPU's point of view
+ * @dst_is_vram: True if destination buffer is in vram.
+ * @size: The size of data to copy.
+ *
+ * Copy @size bytes of data from @src_dpa to @dst_dpa. The functionality
+ * and behavior of this function is similar to xe_migrate_copy function, but
+ * the interface is different. This function is a helper function supposed to
+ * be used by SVM subsytem. Since in SVM subsystem there is no buffer object
+ * and ttm, there is no src/dst bo as function input. Instead, we directly use
+ * src/dst's physical address as function input.
+ *
+ * Since the back store of any user malloc'ed or mmap'ed memory can be placed in
+ * system  memory, it can not be compressed. Thus this function doesn't need
+ * to consider copy CCS (compression control surface) data as xe_migrate_copy did.
+ *
+ * This function assumes the source buffer and destination buffer are all physically
+ * contiguous.
+ *
+ * We use gpu blitter to copy data. Source and destination are first mapped to
+ * migration vm which is a flat one level (L0) page table, then blitter is used to
+ * perform the copy.
+ *
+ * Return: Pointer to a dma_fence representing the last copy batch, or
+ * an error pointer on failure. If there is a failure, any copy operation
+ * started by the function call has been synced.
+ */
+struct dma_fence *xe_migrate_svm(struct xe_migrate *m,
+				  u64 src_dpa,
+				  bool src_is_vram,
+				  u64 dst_dpa,
+				  bool dst_is_vram,
+				  u64 size)
+{
+#define NUM_PT_PER_BLIT (MAX_PREEMPTDISABLE_TRANSFER / SZ_2M)
+	struct xe_gt *gt = m->tile->primary_gt;
+	struct xe_device *xe = gt_to_xe(gt);
+	struct dma_fence *fence = NULL;
+	u64 src_L0_ofs, dst_L0_ofs;
+	u64 round_update_size;
+	/* A slot is a 4K page of page table, covers 2M virtual address*/
+	u32 pt_slot;
+	int err;
+
+	while (size) {
+		u32 batch_size = 2; /* arb_clear() + MI_BATCH_BUFFER_END */
+		struct xe_sched_job *job;
+		struct xe_bb *bb;
+		u32 update_idx;
+
+		/* Maximumly copy MAX_PREEMPTDISABLE_TRANSFER bytes. Why?*/
+		round_update_size = min_t(u64, size, MAX_PREEMPTDISABLE_TRANSFER);
+
+		/* src pte update*/
+		if (!src_is_vram)
+			batch_size += pte_update_cmd_size(round_update_size);
+		/* dst pte update*/
+		if (!dst_is_vram)
+			batch_size += pte_update_cmd_size(round_update_size);
+
+		/* Copy command size*/
+		batch_size += EMIT_COPY_DW;
+
+		bb = xe_bb_new(gt, batch_size, true);
+		if (IS_ERR(bb)) {
+			err = PTR_ERR(bb);
+			goto err_sync;
+		}
+
+		if (!src_is_vram) {
+			pt_slot = 0;
+			build_pt_update_batch_sram(m, bb, pt_slot * XE_PAGE_SIZE,
+					src_dpa, round_update_size);
+			src_L0_ofs = xe_migrate_vm_addr(pt_slot, 0);
+		}
+		else
+			src_L0_ofs = xe_migrate_vram_ofs(xe, src_dpa);
+
+		if (!dst_is_vram) {
+			pt_slot = NUM_PT_PER_BLIT;
+			build_pt_update_batch_sram(m, bb, pt_slot * XE_PAGE_SIZE,
+					dst_dpa, round_update_size);
+			dst_L0_ofs = xe_migrate_vm_addr(pt_slot, 0);
+		}
+		else
+			dst_L0_ofs = xe_migrate_vram_ofs(xe, dst_dpa);
+
+
+		bb->cs[bb->len++] = MI_BATCH_BUFFER_END;
+		update_idx = bb->len;
+
+		emit_copy(gt, bb, src_L0_ofs, dst_L0_ofs, round_update_size,
+			  XE_PAGE_SIZE);
+
+		mutex_lock(&m->job_mutex);
+		job = xe_bb_create_migration_job(m->q, bb,
+						 xe_migrate_batch_base(m, true),
+						 update_idx);
+		if (IS_ERR(job)) {
+			err = PTR_ERR(job);
+			goto err;
+		}
+
+		xe_sched_job_add_migrate_flush(job, 0);
+		xe_sched_job_arm(job);
+		dma_fence_put(fence);
+		fence = dma_fence_get(&job->drm.s_fence->finished);
+		xe_sched_job_push(job);
+		dma_fence_put(m->fence);
+		m->fence = dma_fence_get(fence);
+
+		mutex_unlock(&m->job_mutex);
+
+		xe_bb_free(bb, fence);
+		size -= round_update_size;
+		src_dpa += round_update_size;
+		dst_dpa += round_update_size;
+		continue;
+
+err:
+		mutex_unlock(&m->job_mutex);
+		xe_bb_free(bb, NULL);
+
+err_sync:
+		/* Sync partial copy if any. FIXME: under job_mutex? */
+		if (fence) {
+			dma_fence_wait(fence, false);
+			dma_fence_put(fence);
+		}
+
+		return ERR_PTR(err);
+	}
+
+	return fence;
+}
 static void emit_clear_link_copy(struct xe_gt *gt, struct xe_bb *bb, u64 src_ofs,
 				 u32 size, u32 pitch)
 {
diff --git a/drivers/gpu/drm/xe/xe_migrate.h b/drivers/gpu/drm/xe/xe_migrate.h
index 951f19318ea4..a532760ae1fa 100644
--- a/drivers/gpu/drm/xe/xe_migrate.h
+++ b/drivers/gpu/drm/xe/xe_migrate.h
@@ -88,6 +88,13 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m,
 				  struct ttm_resource *dst,
 				  bool copy_only_ccs);
 
+struct dma_fence *xe_migrate_svm(struct xe_migrate *m,
+				  u64 src_dpa,
+				  bool src_is_vram,
+				  u64 dst_dpa,
+				  bool dst_is_vram,
+				  u64 size);
+
 struct dma_fence *xe_migrate_clear(struct xe_migrate *m,
 				   struct xe_bo *bo,
 				   struct ttm_resource *dst);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 11/22] drm/xe/svm: implement functions to allocate and free device memory
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (9 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 10/22] drm/xe/svm: Introduce svm migration function Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 12/22] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Function xe_devm_alloc_pages allocate pages from drm buddy and perform
house keeping work for all the pages allocated, such as get a page
refcount, keep a bitmap of all pages to denote whether a page is in
use, put pages to a drm lru list for eviction purpose.

Function xe_devm_free_blocks return all memory blocks to drm buddy
allocator.

Function xe_devm_free_page is a call back function from hmm layer. It
is called whenever a page's refcount reaches to 1. This function clears
the bit of this page in the bitmap. If all the bits in the bitmap is
cleared, it means all the pages have been freed, we return all the pages
in this memory block back to drm buddy.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.h        |   9 ++
 drivers/gpu/drm/xe/xe_svm_devmem.c | 146 ++++++++++++++++++++++++++++-
 2 files changed, 154 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index b54f7714a1fc..8551df2b9780 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -74,4 +74,13 @@ struct xe_svm_range *xe_svm_range_from_addr(struct xe_svm *svm,
 int xe_svm_build_sg(struct hmm_range *range, struct sg_table *st);
 int xe_svm_devm_add(struct xe_tile *tile, struct xe_mem_region *mem);
 void xe_svm_devm_remove(struct xe_device *xe, struct xe_mem_region *mem);
+
+
+int xe_devm_alloc_pages(struct xe_tile *tile,
+						unsigned long npages,
+						struct list_head *blocks,
+						unsigned long *pfn);
+
+void xe_devm_free_blocks(struct list_head *blocks);
+void xe_devm_page_free(struct page *page);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_svm_devmem.c b/drivers/gpu/drm/xe/xe_svm_devmem.c
index cf7882830247..445e0e1bc3b4 100644
--- a/drivers/gpu/drm/xe/xe_svm_devmem.c
+++ b/drivers/gpu/drm/xe/xe_svm_devmem.c
@@ -5,18 +5,162 @@
 
 #include <linux/mm_types.h>
 #include <linux/sched/mm.h>
+#include <linux/gfp.h>
+#include <linux/migrate.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-fence.h>
+#include <linux/bitops.h>
+#include <linux/bitmap.h>
+#include <drm/drm_buddy.h>
 
 #include "xe_device_types.h"
 #include "xe_trace.h"
+#include "xe_migrate.h"
+#include "xe_ttm_vram_mgr_types.h"
+#include "xe_assert.h"
 
+/**
+ * struct xe_svm_block_meta - svm uses this data structure to manage each
+ * block allocated from drm buddy. This will be set to the drm_buddy_block's
+ * private field.
+ *
+ * @lru: used to link this block to drm's lru lists. This will be replace
+ * with struct drm_lru_entity later.
+ * @tile: tile from which we allocated this block
+ * @bitmap: A bitmap of each page in this block. 1 means this page is used,
+ * 0 means this page is idle. When all bits of this block are 0, it is time
+ * to return this block to drm buddy subsystem.
+ */
+struct xe_svm_block_meta {
+	struct list_head lru;
+	struct xe_tile *tile;
+	unsigned long bitmap[];
+};
+
+static u64 block_offset_to_pfn(struct xe_mem_region *mr, u64 offset)
+{
+	/** DRM buddy's block offset is 0-based*/
+	offset += mr->hpa_base;
+
+	return PHYS_PFN(offset);
+}
+
+/**
+ * xe_devm_alloc_pages() - allocate device pages from buddy allocator
+ *
+ * @xe_tile: which tile to allocate device memory from
+ * @npages: how many pages to allocate
+ * @blocks: used to return the allocated blocks
+ * @pfn: used to return the pfn of all allocated pages. Must be big enough
+ * to hold at @npages entries.
+ *
+ * This function allocate blocks of memory from drm buddy allocator, and
+ * performs initialization work: set struct page::zone_device_data to point
+ * to the memory block; set/initialize drm_buddy_block::private field;
+ * lock_page for each page allocated; add memory block to lru managers lru
+ * list - this is TBD.
+ *
+ * return: 0 on success
+ * error code otherwise
+ */
+int xe_devm_alloc_pages(struct xe_tile *tile,
+						unsigned long npages,
+						struct list_head *blocks,
+						unsigned long *pfn)
+{
+	struct drm_buddy *mm = &tile->mem.vram_mgr->mm;
+	struct drm_buddy_block *block, *tmp;
+	u64 size = npages << PAGE_SHIFT;
+	int ret = 0, i, j = 0;
+
+	ret = drm_buddy_alloc_blocks(mm, 0, mm->size, size, PAGE_SIZE,
+						blocks, DRM_BUDDY_TOPDOWN_ALLOCATION);
+
+	if (unlikely(ret))
+		return ret;
+
+	list_for_each_entry_safe(block, tmp, blocks, link) {
+		struct xe_mem_region *mr = &tile->mem.vram;
+		u64 block_pfn_first, pages_per_block;
+		struct xe_svm_block_meta *meta;
+		u32 meta_size;
+
+		size = drm_buddy_block_size(mm, block);
+		pages_per_block = size >> PAGE_SHIFT;
+		meta_size = BITS_TO_BYTES(pages_per_block) +
+					sizeof(struct xe_svm_block_meta);
+		meta = kzalloc(meta_size, GFP_KERNEL);
+		bitmap_fill(meta->bitmap, pages_per_block);
+		meta->tile = tile;
+		block->private = meta;
+		block_pfn_first =
+					block_offset_to_pfn(mr, drm_buddy_block_offset(block));
+		for(i = 0; i < pages_per_block; i++) {
+			struct page *page;
+
+			pfn[j++] = block_pfn_first + i;
+			page = pfn_to_page(block_pfn_first + i);
+			/**Lock page per hmm requirement, see hmm.rst.*/
+			zone_device_page_init(page);
+			page->zone_device_data = block;
+		}
+	}
+
+	return ret;
+}
+
+/** FIXME: we locked page by calling zone_device_page_init
+ *  inxe_dev_alloc_pages. Should we unlock pages here?
+ */
+static void free_block(struct drm_buddy_block *block)
+{
+	struct xe_svm_block_meta *meta =
+		(struct xe_svm_block_meta *)block->private;
+	struct xe_tile *tile  = meta->tile;
+	struct drm_buddy *mm = &tile->mem.vram_mgr->mm;
+
+	kfree(block->private);
+	drm_buddy_free_block(mm, block);
+}
+
+/**
+ * xe_devm_free_blocks() - free all memory blocks
+ *
+ * @blocks: memory blocks list head
+ */
+void xe_devm_free_blocks(struct list_head *blocks)
+{
+	struct drm_buddy_block *block, *tmp;
+
+	list_for_each_entry_safe(block, tmp, blocks, link)
+		free_block(block);
+}
 
 static vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf)
 {
 	return 0;
 }
 
-static void xe_devm_page_free(struct page *page)
+void xe_devm_page_free(struct page *page)
 {
+	struct drm_buddy_block *block =
+					(struct drm_buddy_block *)page->zone_device_data;
+	struct xe_svm_block_meta *meta =
+					(struct xe_svm_block_meta *)block->private;
+	struct xe_tile *tile  = meta->tile;
+	struct xe_mem_region *mr = &tile->mem.vram;
+	struct drm_buddy *mm = &tile->mem.vram_mgr->mm;
+	u64 size = drm_buddy_block_size(mm, block);
+	u64 pages_per_block = size >> PAGE_SHIFT;
+	u64 block_pfn_first =
+					block_offset_to_pfn(mr, drm_buddy_block_offset(block));
+	u64 page_pfn = page_to_pfn(page);
+	u64 i = page_pfn - block_pfn_first;
+
+	xe_assert(tile->xe, i < pages_per_block);
+	clear_bit(i, meta->bitmap);
+	if (bitmap_empty(meta->bitmap, pages_per_block))
+		free_block(block);
 }
 
 static const struct dev_pagemap_ops xe_devm_pagemap_ops = {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 12/22] drm/xe/svm: Trace buddy block allocation and free
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (10 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 11/22] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 13/22] drm/xe/svm: Handle CPU page fault Oak Zeng
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm_devmem.c |  5 ++++-
 drivers/gpu/drm/xe/xe_trace.h      | 35 ++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_svm_devmem.c b/drivers/gpu/drm/xe/xe_svm_devmem.c
index 445e0e1bc3b4..5cd54dde4a9d 100644
--- a/drivers/gpu/drm/xe/xe_svm_devmem.c
+++ b/drivers/gpu/drm/xe/xe_svm_devmem.c
@@ -95,6 +95,7 @@ int xe_devm_alloc_pages(struct xe_tile *tile,
 		block->private = meta;
 		block_pfn_first =
 					block_offset_to_pfn(mr, drm_buddy_block_offset(block));
+		trace_xe_buddy_block_alloc(block, size, block_pfn_first);
 		for(i = 0; i < pages_per_block; i++) {
 			struct page *page;
 
@@ -159,8 +160,10 @@ void xe_devm_page_free(struct page *page)
 
 	xe_assert(tile->xe, i < pages_per_block);
 	clear_bit(i, meta->bitmap);
-	if (bitmap_empty(meta->bitmap, pages_per_block))
+	if (bitmap_empty(meta->bitmap, pages_per_block)) {
 		free_block(block);
+		trace_xe_buddy_block_free(block, size, block_pfn_first);
+	}
 }
 
 static const struct dev_pagemap_ops xe_devm_pagemap_ops = {
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 63867c0fa848..50380f5173ca 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -11,6 +11,7 @@
 
 #include <linux/tracepoint.h>
 #include <linux/types.h>
+#include <drm/drm_buddy.h>
 
 #include "xe_bo_types.h"
 #include "xe_exec_queue_types.h"
@@ -600,6 +601,40 @@ DEFINE_EVENT_PRINT(xe_guc_ctb, xe_guc_ctb_g2h,
 
 );
 
+DECLARE_EVENT_CLASS(xe_buddy_block,
+               TP_PROTO(struct drm_buddy_block *block, u64 size, u64 pfn),
+               TP_ARGS(block, size, pfn),
+
+               TP_STRUCT__entry(
+                               __field(u64, block)
+                               __field(u64, header)
+                               __field(u64, size)
+                               __field(u64, pfn)
+               ),
+
+               TP_fast_assign(
+                               __entry->block = (u64)block;
+                               __entry->header = block->header;
+                               __entry->size = size;
+                               __entry->pfn = pfn;
+               ),
+
+               TP_printk("xe svm: allocated block %llx, block header %llx, size %llx, pfn %llx\n",
+                       __entry->block, __entry->header, __entry->size, __entry->pfn)
+);
+
+
+DEFINE_EVENT(xe_buddy_block, xe_buddy_block_alloc,
+               TP_PROTO(struct drm_buddy_block *block, u64 size, u64 pfn),
+               TP_ARGS(block, size, pfn)
+);
+
+
+DEFINE_EVENT(xe_buddy_block, xe_buddy_block_free,
+               TP_PROTO(struct drm_buddy_block *block, u64 size, u64 pfn),
+               TP_ARGS(block, size, pfn)
+);
+
 #endif
 
 /* This part must be outside protection */
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 13/22] drm/xe/svm: Handle CPU page fault
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (11 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 12/22] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 14/22] drm/xe/svm: trace svm range migration Oak Zeng
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Under the picture of svm, CPU and GPU program share one same
virtual address space. The backing store of this virtual address
space can be either in system memory or device memory. Since GPU
device memory is remaped as DEVICE_PRIVATE, CPU can't access it.
Any CPU access to device memory causes a page fault. Implement
a page fault handler to migrate memory back to system memory and
map it to CPU page table so the CPU program can proceed.

Also unbind this page from GPU side, and free the original GPU
device page

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  12 ++
 drivers/gpu/drm/xe/xe_svm.h          |   8 +-
 drivers/gpu/drm/xe/xe_svm_devmem.c   |  10 +-
 drivers/gpu/drm/xe/xe_svm_migrate.c  | 230 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm_range.c    |  27 ++++
 5 files changed, 280 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_svm_migrate.c

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index c67c28f04d2f..ac77996bebe6 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -555,4 +555,16 @@ struct xe_file {
 	struct xe_drm_client *client;
 };
 
+static inline struct xe_tile *mem_region_to_tile(struct xe_mem_region *mr)
+{
+	return container_of(mr, struct xe_tile, mem.vram);
+}
+
+static inline u64 vram_pfn_to_dpa(struct xe_mem_region *mr, u64 pfn)
+{
+	u64 dpa;
+	u64 offset = (pfn << PAGE_SHIFT) - mr->hpa_base;
+	dpa = mr->dpa_base + offset;
+	return dpa;
+}
 #endif
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 8551df2b9780..6b93055934f8 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -12,8 +12,10 @@
 #include <linux/rbtree_types.h>
 #include <linux/interval_tree.h>
 #include <linux/hashtable.h>
+#include <linux/mm_types.h>
 #include <linux/types.h>
 #include <linux/hmm.h>
+#include <linux/mm.h>
 #include "xe_device_types.h"
 
 struct xe_vm;
@@ -66,16 +68,20 @@ struct xe_svm_range {
 	struct interval_tree_node inode;
 };
 
+vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf);
 void xe_destroy_svm(struct xe_svm *svm);
 struct xe_svm *xe_create_svm(struct xe_vm *vm);
 struct xe_svm *xe_lookup_svm_by_mm(struct mm_struct *mm);
 struct xe_svm_range *xe_svm_range_from_addr(struct xe_svm *svm,
 								unsigned long addr);
+bool xe_svm_range_belongs_to_vma(struct mm_struct *mm,
+								struct xe_svm_range *range,
+								struct vm_area_struct *vma);
+
 int xe_svm_build_sg(struct hmm_range *range, struct sg_table *st);
 int xe_svm_devm_add(struct xe_tile *tile, struct xe_mem_region *mem);
 void xe_svm_devm_remove(struct xe_device *xe, struct xe_mem_region *mem);
 
-
 int xe_devm_alloc_pages(struct xe_tile *tile,
 						unsigned long npages,
 						struct list_head *blocks,
diff --git a/drivers/gpu/drm/xe/xe_svm_devmem.c b/drivers/gpu/drm/xe/xe_svm_devmem.c
index 5cd54dde4a9d..01f8385ebb5b 100644
--- a/drivers/gpu/drm/xe/xe_svm_devmem.c
+++ b/drivers/gpu/drm/xe/xe_svm_devmem.c
@@ -11,13 +11,16 @@
 #include <linux/dma-fence.h>
 #include <linux/bitops.h>
 #include <linux/bitmap.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
 #include <drm/drm_buddy.h>
-
 #include "xe_device_types.h"
 #include "xe_trace.h"
 #include "xe_migrate.h"
 #include "xe_ttm_vram_mgr_types.h"
 #include "xe_assert.h"
+#include "xe_pt.h"
+#include "xe_svm.h"
 
 /**
  * struct xe_svm_block_meta - svm uses this data structure to manage each
@@ -137,11 +140,6 @@ void xe_devm_free_blocks(struct list_head *blocks)
 		free_block(block);
 }
 
-static vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf)
-{
-	return 0;
-}
-
 void xe_devm_page_free(struct page *page)
 {
 	struct drm_buddy_block *block =
diff --git a/drivers/gpu/drm/xe/xe_svm_migrate.c b/drivers/gpu/drm/xe/xe_svm_migrate.c
new file mode 100644
index 000000000000..3be26da33aa3
--- /dev/null
+++ b/drivers/gpu/drm/xe/xe_svm_migrate.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2023 Intel Corporation
+ */
+
+#include <linux/gfp.h>
+#include <linux/migrate.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-fence.h>
+#include <linux/bitops.h>
+#include <linux/bitmap.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <drm/drm_buddy.h>
+#include "xe_device_types.h"
+#include "xe_trace.h"
+#include "xe_migrate.h"
+#include "xe_ttm_vram_mgr_types.h"
+#include "xe_assert.h"
+#include "xe_pt.h"
+#include "xe_svm.h"
+
+
+/**
+ * alloc_host_page() - allocate one host page for the fault vma
+ *
+ * @dev: (GPU) device that will access the allocated page
+ * @vma: the fault vma that we need allocate page for
+ * @addr: the fault address. The allocated page is for this address
+ * @dma_addr: used to output the dma address of the allocated page.
+ * This dma address will be used for gpu to access this page. GPU
+ * access host page through a dma mapped address.
+ * @pfn: used to output the pfn of the allocated page.
+ *
+ * This function allocate one host page for the specified vma. It
+ * also does some prepare work for GPU to access this page, such
+ * as map this page to iommu (by calling dma_map_page).
+ *
+ * When this function returns, the page is locked.
+ *
+ * Return struct page pointer when success
+ * NULL otherwise
+ */
+static struct page *alloc_host_page(struct device *dev,
+							 struct vm_area_struct *vma,
+							 unsigned long addr,
+							 dma_addr_t *dma_addr,
+							 unsigned long *pfn)
+{
+	struct page *page;
+
+	page = alloc_page_vma(GFP_HIGHUSER, vma, addr);
+	if (unlikely(!page))
+		return NULL;
+
+	/**Lock page per hmm requirement, see hmm.rst*/
+	lock_page(page);
+	*dma_addr = dma_map_page(dev, page, 0, PAGE_SIZE, DMA_FROM_DEVICE);
+	if (unlikely(dma_mapping_error(dev, *dma_addr))) {
+		unlock_page(page);
+		__free_page(page);
+		return NULL;
+	}
+
+	*pfn = migrate_pfn(page_to_pfn(page));
+	return page;
+}
+
+static void free_host_page(struct page *page)
+{
+	unlock_page(page);
+	put_page(page);
+}
+
+static inline struct xe_mem_region *page_to_mem_region(struct page *page)
+{
+	return container_of(page->pgmap, struct xe_mem_region, pagemap);
+}
+
+/**
+ * migrate_page_vram_to_ram() - migrate one page from vram to ram
+ *
+ * @vma: The vma that the page is mapped to
+ * @addr: The virtual address that the page is mapped to
+ * @src_pfn: src page's page frame number
+ * @dst_pfn: used to return dstination page (in system ram)'s pfn
+ *
+ * Allocate one page in system ram and copy memory from device memory
+ * to system ram.
+ *
+ * Return: 0 if this page is already in sram (no need to migrate)
+ * 1: successfully migrated this page from vram to sram.
+ * error code otherwise
+ */
+static int migrate_page_vram_to_ram(struct vm_area_struct *vma, unsigned long addr,
+						unsigned long src_pfn, unsigned long *dst_pfn)
+{
+	struct xe_mem_region *mr;
+	struct xe_tile *tile;
+	struct xe_device *xe;
+	struct device *dev;
+	dma_addr_t dma_addr = 0;
+	struct dma_fence *fence;
+	struct page *host_page;
+	struct page *src_page;
+	u64 src_dpa;
+
+	src_page = migrate_pfn_to_page(src_pfn);
+	if (unlikely(!src_page || !(src_pfn & MIGRATE_PFN_MIGRATE)))
+		return 0;
+
+	mr = page_to_mem_region(src_page);
+	tile = mem_region_to_tile(mr);
+	xe = tile_to_xe(tile);
+	dev = xe->drm.dev;
+
+	src_dpa = vram_pfn_to_dpa(mr, src_pfn);
+	host_page = alloc_host_page(dev, vma, addr, &dma_addr, dst_pfn);
+	if (!host_page)
+		return -ENOMEM;
+
+	fence = xe_migrate_svm(tile->migrate, src_dpa, true,
+						dma_addr, false, PAGE_SIZE);
+	if (IS_ERR(fence)) {
+		dma_unmap_page(dev, dma_addr, PAGE_SIZE, DMA_FROM_DEVICE);
+		free_host_page(host_page);
+		return PTR_ERR(fence);
+	}
+
+	dma_fence_wait(fence, false);
+	dma_fence_put(fence);
+	dma_unmap_page(dev, dma_addr, PAGE_SIZE, DMA_FROM_DEVICE);
+	return 1;
+}
+
+/**
+ * xe_devmem_migrate_to_ram() - Migrate memory back to sram on CPU page fault
+ *
+ * @vmf: cpu vm fault structure, contains fault information such as vma etc.
+ *
+ * Note, this is in CPU's vm fault handler, caller holds the mmap read lock.
+ * FIXME: relook the lock design here. Is there any deadlock?
+ *
+ * This function migrate one svm range which contains the fault address to sram.
+ * We try to maintain a 1:1 mapping b/t the vma and svm_range (i.e., create one
+ * svm range for one vma initially and try not to split it). So this scheme end
+ * up migrate at the vma granularity. This might not be the best performant scheme
+ * when GPU is in the picture.
+ *
+ * This can be tunned with a migration granularity for  performance, for example,
+ * migration 2M for each CPU page fault, or let user specify how much to migrate.
+ * But this is more complicated as this scheme requires vma and svm_range splitting.
+ *
+ * This function should also update GPU page table, so the fault virtual address
+ * points to the same sram location from GPU side. This is TBD.
+ *
+ * Return:
+ * 0 on success
+ * VM_FAULT_SIGBUS: failed to migrate page to system memory, application
+ * will be signaled a SIGBUG
+ */
+vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf)
+{
+	struct xe_mem_region *mr = page_to_mem_region(vmf->page);
+	struct xe_tile *tile = mem_region_to_tile(mr);
+	struct xe_device *xe = tile_to_xe(tile);
+	struct vm_area_struct *vma = vmf->vma;
+	struct mm_struct *mm = vma->vm_mm;
+	struct xe_svm *svm = xe_lookup_svm_by_mm(mm);
+	struct xe_svm_range *range = xe_svm_range_from_addr(svm, vmf->address);
+	struct xe_vm *vm = svm->vm;
+	u64 npages = (range->end - range->start) >> PAGE_SHIFT;
+	unsigned long addr = range->start;
+	vm_fault_t ret = 0;
+	void *buf;
+	int i;
+
+	struct migrate_vma migrate_vma = {
+		.vma		= vmf->vma,
+		.start		= range->start,
+		.end		= range->end,
+		.pgmap_owner	= xe->drm.dev,
+		.flags		= MIGRATE_VMA_SELECT_DEVICE_PRIVATE,
+		.fault_page = vmf->page,
+	};
+
+	xe_assert(xe, IS_ALIGNED(vmf->address, PAGE_SIZE));
+	xe_assert(xe, IS_ALIGNED(range->start, PAGE_SIZE));
+	xe_assert(xe, IS_ALIGNED(range->end, PAGE_SIZE));
+	/**FIXME: in case of vma split, svm range might not belongs to one vma*/
+	xe_assert(xe, xe_svm_range_belongs_to_vma(mm, range, vma));
+
+	buf = kvcalloc(npages, 2* sizeof(*migrate_vma.src), GFP_KERNEL);
+	migrate_vma.src = buf;
+	migrate_vma.dst = buf + npages;
+	if (migrate_vma_setup(&migrate_vma) < 0) {
+		ret = VM_FAULT_SIGBUS;
+		goto free_buf;
+	}
+
+	if (!migrate_vma.cpages)
+		goto free_buf;
+
+	for (i = 0; i < npages; i++) {
+		ret = migrate_page_vram_to_ram(vma, addr, migrate_vma.src[i],
+							migrate_vma.dst + i);
+		if (ret < 0) {
+			ret = VM_FAULT_SIGBUS;
+			break;
+		}
+
+		/** Migration has been successful, unbind src page from gpu,
+		 *  and free source page
+		 */
+		if (ret == 1) {
+			struct page *src_page = migrate_pfn_to_page(migrate_vma.src[i]);
+
+			xe_invalidate_svm_range(vm, addr, PAGE_SIZE);
+			xe_devm_page_free(src_page);
+		}
+
+		addr += PAGE_SIZE;
+	}
+
+	migrate_vma_pages(&migrate_vma);
+	migrate_vma_finalize(&migrate_vma);
+free_buf:
+	kvfree(buf);
+	return 0;
+}
diff --git a/drivers/gpu/drm/xe/xe_svm_range.c b/drivers/gpu/drm/xe/xe_svm_range.c
index d8251d38f65e..b32c32f60315 100644
--- a/drivers/gpu/drm/xe/xe_svm_range.c
+++ b/drivers/gpu/drm/xe/xe_svm_range.c
@@ -5,7 +5,9 @@
 
 #include <linux/interval_tree.h>
 #include <linux/container_of.h>
+#include <linux/mm_types.h>
 #include <linux/mutex.h>
+#include <linux/mm.h>
 #include "xe_svm.h"
 
 /**
@@ -30,3 +32,28 @@ struct xe_svm_range *xe_svm_range_from_addr(struct xe_svm *svm,
 
 	return container_of(node, struct xe_svm_range, inode);
 }
+
+/**
+ * xe_svm_range_belongs_to_vma() - determine a virtual address range
+ * belongs to a vma or not
+ *
+ * @mm: the mm of the virtual address range
+ * @range: the svm virtual address range
+ * @vma: the vma to determine the range
+ *
+ * Returns true if range belongs to vma
+ * false otherwise
+ */
+bool xe_svm_range_belongs_to_vma(struct mm_struct *mm,
+								struct xe_svm_range *range,
+								struct vm_area_struct *vma)
+{
+	struct vm_area_struct *vma1, *vma2;
+	unsigned long start = range->start;
+	unsigned long end = range->end;
+
+	vma1  = find_vma_intersection(mm, start, start + 4);
+	vma2  = find_vma_intersection(mm, end - 4, end);
+
+	return (vma1 == vma) && (vma2 == vma);
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 14/22] drm/xe/svm: trace svm range migration
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (12 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 13/22] drm/xe/svm: Handle CPU page fault Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 15/22] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Add function to trace svm range migration, either
from vram to sram, or sram to vram

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm_migrate.c |  1 +
 drivers/gpu/drm/xe/xe_trace.h       | 30 +++++++++++++++++++++++++++++
 2 files changed, 31 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_svm_migrate.c b/drivers/gpu/drm/xe/xe_svm_migrate.c
index 3be26da33aa3..b4df411e04f3 100644
--- a/drivers/gpu/drm/xe/xe_svm_migrate.c
+++ b/drivers/gpu/drm/xe/xe_svm_migrate.c
@@ -201,6 +201,7 @@ vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf)
 	if (!migrate_vma.cpages)
 		goto free_buf;
 
+	trace_xe_svm_migrate_vram_to_sram(range);
 	for (i = 0; i < npages; i++) {
 		ret = migrate_page_vram_to_ram(vma, addr, migrate_vma.src[i],
 							migrate_vma.dst + i);
diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h
index 50380f5173ca..960eec38aee5 100644
--- a/drivers/gpu/drm/xe/xe_trace.h
+++ b/drivers/gpu/drm/xe/xe_trace.h
@@ -21,6 +21,7 @@
 #include "xe_guc_exec_queue_types.h"
 #include "xe_sched_job.h"
 #include "xe_vm.h"
+#include "xe_svm.h"
 
 DECLARE_EVENT_CLASS(xe_gt_tlb_invalidation_fence,
 		    TP_PROTO(struct xe_gt_tlb_invalidation_fence *fence),
@@ -601,6 +602,35 @@ DEFINE_EVENT_PRINT(xe_guc_ctb, xe_guc_ctb_g2h,
 
 );
 
+DECLARE_EVENT_CLASS(xe_svm_migrate,
+		    TP_PROTO(struct xe_svm_range *range),
+		    TP_ARGS(range),
+
+		    TP_STRUCT__entry(
+			     __field(u64, start)
+			     __field(u64, end)
+			     ),
+
+		    TP_fast_assign(
+			   __entry->start = range->start;
+			   __entry->end = range->end;
+			   ),
+
+		    TP_printk("Migrate svm range [0x%016llx,0x%016llx)",  __entry->start,
+			      __entry->end)
+);
+
+DEFINE_EVENT(xe_svm_migrate, xe_svm_migrate_vram_to_sram,
+		    TP_PROTO(struct xe_svm_range *range),
+		    TP_ARGS(range)
+);
+
+
+DEFINE_EVENT(xe_svm_migrate, xe_svm_migrate_sram_to_vram,
+		    TP_PROTO(struct xe_svm_range *range),
+		    TP_ARGS(range)
+);
+
 DECLARE_EVENT_CLASS(xe_buddy_block,
                TP_PROTO(struct drm_buddy_block *block, u64 size, u64 pfn),
                TP_ARGS(block, size, pfn),
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 15/22] drm/xe/svm: Implement functions to register and unregister mmu notifier
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (13 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 14/22] drm/xe/svm: trace svm range migration Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 16/22] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

xe driver register mmu interval notifier to core mm to monitor vma
change. We register mmu interval notifier for each svm range. mmu
interval notifier should be unregistered in a worker (see next patch
in this series), so also initialize kernel worker to unregister mmu
interval notifier.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.h       | 14 ++++++
 drivers/gpu/drm/xe/xe_svm_range.c | 73 +++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 6b93055934f8..90e665f2bfc6 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -52,16 +52,28 @@ struct xe_svm {
  * struct xe_svm_range - Represents a shared virtual address range.
  */
 struct xe_svm_range {
+	/** @svm: pointer of the xe_svm that this range belongs to */
+	struct xe_svm *svm;
+
 	/** @notifier: The mmu interval notifer used to keep track of CPU
 	 * side address range change. Driver will get a callback with this
 	 * notifier if anything changed from CPU side, such as range is
 	 * unmapped from CPU
 	 */
 	struct mmu_interval_notifier notifier;
+	bool mmu_notifier_registered;
 	/** @start: start address of this range, inclusive */
 	u64 start;
 	/** @end: end address of this range, exclusive */
 	u64 end;
+	/** @vma: the corresponding vma of this svm range
+	 *  The relationship b/t vma and svm range is 1:N,
+	 *  which means one vma can be splitted into multiple
+	 *  @xe_svm_range while one @xe_svm_range can have
+	 *  only one vma. A N:N mapping means some complication
+	 *  in codes. Lets assume 1:N for now.
+	 */
+	struct vm_area_struct *vma;
 	/** @unregister_notifier_work: A worker used to unregister this notifier */
 	struct work_struct unregister_notifier_work;
 	/** @inode: used to link this range to svm's range_tree */
@@ -77,6 +89,8 @@ struct xe_svm_range *xe_svm_range_from_addr(struct xe_svm *svm,
 bool xe_svm_range_belongs_to_vma(struct mm_struct *mm,
 								struct xe_svm_range *range,
 								struct vm_area_struct *vma);
+void xe_svm_range_unregister_mmu_notifier(struct xe_svm_range *range);
+int xe_svm_range_register_mmu_notifier(struct xe_svm_range *range);
 
 int xe_svm_build_sg(struct hmm_range *range, struct sg_table *st);
 int xe_svm_devm_add(struct xe_tile *tile, struct xe_mem_region *mem);
diff --git a/drivers/gpu/drm/xe/xe_svm_range.c b/drivers/gpu/drm/xe/xe_svm_range.c
index b32c32f60315..286d5f7d6ecd 100644
--- a/drivers/gpu/drm/xe/xe_svm_range.c
+++ b/drivers/gpu/drm/xe/xe_svm_range.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/interval_tree.h>
+#include <linux/mmu_notifier.h>
 #include <linux/container_of.h>
 #include <linux/mm_types.h>
 #include <linux/mutex.h>
@@ -57,3 +58,75 @@ bool xe_svm_range_belongs_to_vma(struct mm_struct *mm,
 
 	return (vma1 == vma) && (vma2 == vma);
 }
+
+static const struct mmu_interval_notifier_ops xe_svm_mni_ops = {
+	.invalidate = NULL,
+};
+
+/**
+ * unregister a mmu interval notifier for a svm range
+ *
+ * @range: svm range
+ *
+ */
+void xe_svm_range_unregister_mmu_notifier(struct xe_svm_range *range)
+{
+	if (!range->mmu_notifier_registered)
+		return;
+
+	mmu_interval_notifier_remove(&range->notifier);
+	range->mmu_notifier_registered = false;
+}
+
+static void xe_svm_unregister_notifier_work(struct work_struct *work)
+{
+	struct xe_svm_range *range;
+
+	range = container_of(work, struct xe_svm_range, unregister_notifier_work);
+
+	xe_svm_range_unregister_mmu_notifier(range);
+
+	/**
+	 * This is called from mmu notifier MUNMAP event. When munmap is called,
+	 * this range is not valid any more. Remove it.
+	 */
+	mutex_lock(&range->svm->mutex);
+	interval_tree_remove(&range->inode, &range->svm->range_tree);
+	mutex_unlock(&range->svm->mutex);
+	kfree(range);
+}
+
+/**
+ * register a mmu interval notifier to monitor vma change
+ *
+ * @range: svm range to monitor
+ *
+ * This has to be called inside a mmap_read_lock
+ */
+int xe_svm_range_register_mmu_notifier(struct xe_svm_range *range)
+{
+	struct vm_area_struct *vma = range->vma;
+	struct mm_struct *mm = range->svm->mm;
+	u64 start, length;
+	int ret = 0;
+
+	if (range->mmu_notifier_registered)
+		return 0;
+
+	start =  range->start;
+	length = range->end - start;
+	/** We are inside a mmap_read_lock, but it requires a mmap_write_lock
+	 *  to register mmu notifier.
+	 */
+	mmap_read_unlock(mm);
+	mmap_write_lock(mm);
+	ret = mmu_interval_notifier_insert_locked(&range->notifier, vma->vm_mm,
+						start, length, &xe_svm_mni_ops);
+	mmap_write_downgrade(mm);
+	if (ret)
+		return ret;
+
+	INIT_WORK(&range->unregister_notifier_work, xe_svm_unregister_notifier_work);
+	range->mmu_notifier_registered = true;
+	return ret;
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 16/22] drm/xe/svm: Implement the mmu notifier range invalidate callback
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (14 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 15/22] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 17/22] drm/xe/svm: clean up svm range during process exit Oak Zeng
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

To mirror the CPU page table from GPU side, we register a mmu interval
notifier (in the coming patch of this series). Core mm call back to
GPU driver whenever there is a change to certain virtual address range,
i.e., range is released or unmapped by user etc.

This patch implemented the GPU driver callback function for such mmu
interval notifier. In the callback function we unbind the address
range from GPU if it is unmapped from CPU side, thus we mirror the
CPU page table change.

We also unregister the mmu interval notifier from core mm in the case
of munmap event. But we can't unregister mmu notifier directly from the
mmu notifier range invalidation callback function. The reason is, during
a munmap (see kernel function vm_munmap), a mmap_write_lock is held, but
unregister mmu notifier (calling mmu_interval_notifier_remove) also requires
a mmap_write_lock of the current process.

Thus, we start a kernel worker to unregister mmu interval notifier on a
MMU_NOTIFY_UNMAP event.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c       |  1 +
 drivers/gpu/drm/xe/xe_svm.h       |  1 -
 drivers/gpu/drm/xe/xe_svm_range.c | 37 ++++++++++++++++++++++++++++++-
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index ab3cc2121869..6393251c0051 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -8,6 +8,7 @@
 #include "xe_svm.h"
 #include <linux/hmm.h>
 #include <linux/scatterlist.h>
+#include "xe_pt.h"
 
 DEFINE_HASHTABLE(xe_svm_table, XE_MAX_SVM_PROCESS);
 
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 90e665f2bfc6..0038f98c0cc7 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -54,7 +54,6 @@ struct xe_svm {
 struct xe_svm_range {
 	/** @svm: pointer of the xe_svm that this range belongs to */
 	struct xe_svm *svm;
-
 	/** @notifier: The mmu interval notifer used to keep track of CPU
 	 * side address range change. Driver will get a callback with this
 	 * notifier if anything changed from CPU side, such as range is
diff --git a/drivers/gpu/drm/xe/xe_svm_range.c b/drivers/gpu/drm/xe/xe_svm_range.c
index 286d5f7d6ecd..53dd3be7ab9f 100644
--- a/drivers/gpu/drm/xe/xe_svm_range.c
+++ b/drivers/gpu/drm/xe/xe_svm_range.c
@@ -10,6 +10,7 @@
 #include <linux/mutex.h>
 #include <linux/mm.h>
 #include "xe_svm.h"
+#include "xe_pt.h"
 
 /**
  * xe_svm_range_from_addr() - retrieve svm_range contains a virtual address
@@ -59,8 +60,42 @@ bool xe_svm_range_belongs_to_vma(struct mm_struct *mm,
 	return (vma1 == vma) && (vma2 == vma);
 }
 
+static bool xe_svm_range_invalidate(struct mmu_interval_notifier *mni,
+				      const struct mmu_notifier_range *range,
+				      unsigned long cur_seq)
+{
+	struct xe_svm_range *svm_range =
+		container_of(mni, struct xe_svm_range, notifier);
+	struct xe_svm *svm = svm_range->svm;
+	unsigned long length = range->end - range->start;
+
+	/*
+	 * MMU_NOTIFY_RELEASE is called upon process exit to notify driver
+	 * to release any process resources, such as zap GPU page table
+	 * mapping or unregister mmu notifier etc. We already clear GPU
+	 * page table  and unregister mmu notifier in in xe_destroy_svm,
+	 * upon process exit. So just simply return here.
+	 */
+	if (range->event == MMU_NOTIFY_RELEASE)
+		return true;
+
+	if (mmu_notifier_range_blockable(range))
+		mutex_lock(&svm->mutex);
+	else if (!mutex_trylock(&svm->mutex))
+		return false;
+
+	mmu_interval_set_seq(mni, cur_seq);
+	xe_invalidate_svm_range(svm->vm, range->start, length);
+	mutex_unlock(&svm->mutex);
+
+	if (range->event == MMU_NOTIFY_UNMAP)
+		queue_work(system_unbound_wq, &svm_range->unregister_notifier_work);
+
+	return true;
+}
+
 static const struct mmu_interval_notifier_ops xe_svm_mni_ops = {
-	.invalidate = NULL,
+	.invalidate = xe_svm_range_invalidate,
 };
 
 /**
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 17/22] drm/xe/svm: clean up svm range during process exit
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (15 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 16/22] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 18/22] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Clean up svm range during process exit: Zap GPU page table of
the svm process on process exit; unregister all the mmu interval
notifiers which are registered before; free svm range and svm
data structure.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c       | 24 ++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm.h       |  1 +
 drivers/gpu/drm/xe/xe_svm_range.c | 17 +++++++++++++++++
 3 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 6393251c0051..5772bfcf7da4 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -9,6 +9,8 @@
 #include <linux/hmm.h>
 #include <linux/scatterlist.h>
 #include "xe_pt.h"
+#include "xe_assert.h"
+#include "xe_vm_types.h"
 
 DEFINE_HASHTABLE(xe_svm_table, XE_MAX_SVM_PROCESS);
 
@@ -19,9 +21,31 @@ DEFINE_HASHTABLE(xe_svm_table, XE_MAX_SVM_PROCESS);
  */
 void xe_destroy_svm(struct xe_svm *svm)
 {
+#define MAX_SVM_RANGE (1024*1024)
+	struct xe_svm_range **range_array;
+	struct interval_tree_node *node;
+	struct xe_svm_range *range;
+	int i = 0;
+
+	range_array = kzalloc(sizeof(struct xe_svm_range *) * MAX_SVM_RANGE,
+							GFP_KERNEL);
+	node = interval_tree_iter_first(&svm->range_tree, 0, ~0ULL);
+	while (node) {
+		range = container_of(node, struct xe_svm_range, inode);
+		xe_svm_range_prepare_destroy(range);
+		node = interval_tree_iter_next(node, 0, ~0ULL);
+		xe_assert(svm->vm->xe, i < MAX_SVM_RANGE);
+		range_array[i++] = range;
+	}
+
+	/** Free range (thus range->inode) while traversing above is not safe */
+	for(; i >= 0; i--)
+		kfree(range_array[i]);
+
 	hash_del_rcu(&svm->hnode);
 	mutex_destroy(&svm->mutex);
 	kfree(svm);
+	kfree(range_array);
 }
 
 /**
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 0038f98c0cc7..5b3bd2c064f5 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -90,6 +90,7 @@ bool xe_svm_range_belongs_to_vma(struct mm_struct *mm,
 								struct vm_area_struct *vma);
 void xe_svm_range_unregister_mmu_notifier(struct xe_svm_range *range);
 int xe_svm_range_register_mmu_notifier(struct xe_svm_range *range);
+void xe_svm_range_prepare_destroy(struct xe_svm_range *range);
 
 int xe_svm_build_sg(struct hmm_range *range, struct sg_table *st);
 int xe_svm_devm_add(struct xe_tile *tile, struct xe_mem_region *mem);
diff --git a/drivers/gpu/drm/xe/xe_svm_range.c b/drivers/gpu/drm/xe/xe_svm_range.c
index 53dd3be7ab9f..dfb4660dc26f 100644
--- a/drivers/gpu/drm/xe/xe_svm_range.c
+++ b/drivers/gpu/drm/xe/xe_svm_range.c
@@ -165,3 +165,20 @@ int xe_svm_range_register_mmu_notifier(struct xe_svm_range *range)
 	range->mmu_notifier_registered = true;
 	return ret;
 }
+
+/**
+ * xe_svm_range_prepare_destroy() - prepare work to destroy a svm range
+ *
+ * @range: the svm range to destroy
+ *
+ * prepare for a svm range destroy: Zap this range from GPU, unregister mmu
+ * notifier.
+ */
+void xe_svm_range_prepare_destroy(struct xe_svm_range *range)
+{
+	struct xe_vm *vm = range->svm->vm;
+	unsigned long length = range->end - range->start;
+
+	xe_invalidate_svm_range(vm, range->start, length);
+	xe_svm_range_unregister_mmu_notifier(range);
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 18/22] drm/xe/svm: Move a few structures to xe_gt.h
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (16 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 17/22] drm/xe/svm: clean up svm range during process exit Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 19/22] drm/xe/svm: migrate svm range to vram Oak Zeng
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Move access_type and pagefault struct to header file so it
can be shared with svm sub-system. This is preparation work
for enabling page fault for svm.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_gt.h           | 20 ++++++++++++++++++++
 drivers/gpu/drm/xe/xe_gt_pagefault.c | 21 ---------------------
 2 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_gt.h b/drivers/gpu/drm/xe/xe_gt.h
index 4486e083f5ef..51dd288cf1cf 100644
--- a/drivers/gpu/drm/xe/xe_gt.h
+++ b/drivers/gpu/drm/xe/xe_gt.h
@@ -17,6 +17,26 @@
 			  xe_hw_engine_is_valid((hwe__)))
 
 #define CCS_MASK(gt) (((gt)->info.engine_mask & XE_HW_ENGINE_CCS_MASK) >> XE_HW_ENGINE_CCS0)
+enum access_type {
+	ACCESS_TYPE_READ = 0,
+	ACCESS_TYPE_WRITE = 1,
+	ACCESS_TYPE_ATOMIC = 2,
+	ACCESS_TYPE_RESERVED = 3,
+};
+
+struct pagefault {
+	u64 page_addr;
+	u32 asid;
+	u16 pdata;
+	u8 vfid;
+	u8 access_type;
+	u8 fault_type;
+	u8 fault_level;
+	u8 engine_class;
+	u8 engine_instance;
+	u8 fault_unsuccessful;
+	bool trva_fault;
+};
 
 #ifdef CONFIG_FAULT_INJECTION
 #include <linux/fault-inject.h> /* XXX: fault-inject.h is broken */
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 4489aadc7a52..6de1ff195aaa 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -23,27 +23,6 @@
 #include "xe_trace.h"
 #include "xe_vm.h"
 
-struct pagefault {
-	u64 page_addr;
-	u32 asid;
-	u16 pdata;
-	u8 vfid;
-	u8 access_type;
-	u8 fault_type;
-	u8 fault_level;
-	u8 engine_class;
-	u8 engine_instance;
-	u8 fault_unsuccessful;
-	bool trva_fault;
-};
-
-enum access_type {
-	ACCESS_TYPE_READ = 0,
-	ACCESS_TYPE_WRITE = 1,
-	ACCESS_TYPE_ATOMIC = 2,
-	ACCESS_TYPE_RESERVED = 3,
-};
-
 enum fault_type {
 	NOT_PRESENT = 0,
 	WRITE_ACCESS_VIOLATION = 1,
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 19/22] drm/xe/svm: migrate svm range to vram
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (17 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 18/22] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 20/22] drm/xe/svm: Populate svm range Oak Zeng
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Since the source pages of the svm range can be physically none
contiguous, and the destination vram pages can also be none
contiguous, there is no easy way to migrate multiple pages per
blitter command. We do page by page migration for now.

Migration is best effort. Even if we fail to migrate some pages,
we will try to migrate the rest pages.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c         |   7 ++
 drivers/gpu/drm/xe/xe_svm.h         |   3 +
 drivers/gpu/drm/xe/xe_svm_migrate.c | 114 ++++++++++++++++++++++++++++
 3 files changed, 124 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 5772bfcf7da4..44d4f4216a93 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -5,12 +5,19 @@
 
 #include <linux/mutex.h>
 #include <linux/mm_types.h>
+#include <linux/interval_tree.h>
+#include <linux/container_of.h>
+#include <linux/types.h>
+#include <linux/migrate.h>
 #include "xe_svm.h"
 #include <linux/hmm.h>
 #include <linux/scatterlist.h>
 #include "xe_pt.h"
 #include "xe_assert.h"
 #include "xe_vm_types.h"
+#include "xe_gt.h"
+#include "xe_migrate.h"
+#include "xe_trace.h"
 
 DEFINE_HASHTABLE(xe_svm_table, XE_MAX_SVM_PROCESS);
 
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 5b3bd2c064f5..659bcb7927d6 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -80,6 +80,9 @@ struct xe_svm_range {
 };
 
 vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf);
+int svm_migrate_range_to_vram(struct xe_svm_range *range,
+							struct vm_area_struct *vma,
+							struct xe_tile *tile);
 void xe_destroy_svm(struct xe_svm *svm);
 struct xe_svm *xe_create_svm(struct xe_vm *vm);
 struct xe_svm *xe_lookup_svm_by_mm(struct mm_struct *mm);
diff --git a/drivers/gpu/drm/xe/xe_svm_migrate.c b/drivers/gpu/drm/xe/xe_svm_migrate.c
index b4df411e04f3..3724ad6c7aea 100644
--- a/drivers/gpu/drm/xe/xe_svm_migrate.c
+++ b/drivers/gpu/drm/xe/xe_svm_migrate.c
@@ -229,3 +229,117 @@ vm_fault_t xe_devm_migrate_to_ram(struct vm_fault *vmf)
 	kvfree(buf);
 	return 0;
 }
+
+
+/**
+ * svm_migrate_range_to_vram() - migrate backing store of a va range to vram
+ * Must be called with mmap_read_lock(mm) held.
+ * @range: the va range to migrate. Range should only belong to one vma.
+ * @vma: the vma that this range belongs to. @range can cover whole @vma
+ * or a sub-range of @vma.
+ * @tile: the destination tile which holds the new backing store of the range
+ *
+ * Returns: negative errno on faiure, 0 on success
+ */
+int svm_migrate_range_to_vram(struct xe_svm_range *range,
+							struct vm_area_struct *vma,
+							struct xe_tile *tile)
+{
+	struct mm_struct *mm = range->svm->mm;
+	unsigned long start = range->start;
+	unsigned long end = range->end;
+	unsigned long npages = (end - start) >> PAGE_SHIFT;
+	struct xe_mem_region *mr = &tile->mem.vram;
+	struct migrate_vma migrate = {
+		.vma		= vma,
+		.start		= start,
+		.end		= end,
+		.pgmap_owner	= tile->xe->drm.dev,
+		.flags          = MIGRATE_VMA_SELECT_SYSTEM,
+	};
+	struct device *dev = tile->xe->drm.dev;
+	dma_addr_t *src_dma_addr;
+	struct dma_fence *fence;
+	struct page *src_page;
+	LIST_HEAD(blocks);
+	int ret = 0, i;
+	u64 dst_dpa;
+	void *buf;
+
+	mmap_assert_locked(mm);
+	xe_assert(tile->xe, xe_svm_range_belongs_to_vma(mm, range, vma));
+
+	buf = kvcalloc(npages, 2* sizeof(*migrate.src) + sizeof(*src_dma_addr),
+					GFP_KERNEL);
+	if(!buf)
+		return -ENOMEM;
+	migrate.src = buf;
+	migrate.dst = migrate.src + npages;
+	src_dma_addr = (dma_addr_t *) (migrate.dst + npages);
+	ret = xe_devm_alloc_pages(tile, npages, &blocks, migrate.dst);
+	if (ret)
+		goto kfree_buf;
+
+	ret = migrate_vma_setup(&migrate);
+	if (ret) {
+		drm_err(&tile->xe->drm, "vma setup returned %d for range [%lx - %lx]\n",
+				ret, start, end);
+		goto free_dst_pages;
+	}
+
+	trace_xe_svm_migrate_sram_to_vram(range);
+	/**FIXME: partial migration of a range
+	 * print a warning for now. If this message
+	 * is printed, we need to fall back to page by page
+	 * migration: only migrate pages with MIGRATE_PFN_MIGRATE
+	 */
+	if (migrate.cpages != npages)
+		drm_warn(&tile->xe->drm, "Partial migration for range [%lx - %lx], range is %ld pages, migrate only %ld pages\n",
+				start, end, npages, migrate.cpages);
+
+	/**Migrate page by page for now.
+	 * Both source pages and destination pages can physically not contiguous,
+	 * there is no good way to migrate multiple pages per blitter command.
+	 */
+	for (i = 0; i < npages; i++) {
+		src_page = migrate_pfn_to_page(migrate.src[i]);
+		if (unlikely(!src_page || !(migrate.src[i] & MIGRATE_PFN_MIGRATE)))
+			goto free_dst_page;
+
+		xe_assert(tile->xe, !is_zone_device_page(src_page));
+		src_dma_addr[i] = dma_map_page(dev, src_page, 0, PAGE_SIZE, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(dev, src_dma_addr[i]))) {
+			drm_warn(&tile->xe->drm, "dma map error for host pfn %lx\n", migrate.src[i]);
+			goto free_dst_page;
+		}
+		dst_dpa = vram_pfn_to_dpa(mr, migrate.dst[i]);
+		fence = xe_migrate_svm(tile->migrate, src_dma_addr[i], false,
+				dst_dpa, true, PAGE_SIZE);
+		if (IS_ERR(fence)) {
+			drm_warn(&tile->xe->drm, "migrate host page (pfn: %lx) to vram failed\n",
+					migrate.src[i]);
+			/**Migration is best effort. Even we failed here, we continue*/
+			goto free_dst_page;
+		}
+		/**FIXME: Use the first migration's out fence as the second migration's input fence,
+		 * and so on. Only wait the out fence of last migration?
+		 */
+		dma_fence_wait(fence, false);
+		dma_fence_put(fence);
+free_dst_page:
+		xe_devm_page_free(pfn_to_page(migrate.dst[i]));
+	}
+
+	for (i = 0; i < npages; i++)
+		if (!(dma_mapping_error(dev, src_dma_addr[i])))
+			dma_unmap_page(dev, src_dma_addr[i], PAGE_SIZE, DMA_TO_DEVICE);
+
+	migrate_vma_pages(&migrate);
+	migrate_vma_finalize(&migrate);
+free_dst_pages:
+	if (ret)
+		xe_devm_free_blocks(&blocks);
+kfree_buf:
+	kfree(buf);
+	return ret;
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 20/22] drm/xe/svm: Populate svm range
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (18 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 19/22] drm/xe/svm: migrate svm range to vram Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 21/22] drm/xe/svm: GPU page fault support Oak Zeng
  2023-12-21  4:38 ` [PATCH 22/22] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

Add a helper function svm_populate_range to populate
a svm range. This functions calls hmm_range_fault
to read CPU page tables and populate all pfns of this
virtual address range into an array, saved in hmm_range::
hmm_pfns. This is prepare work to bind a svm range to
GPU. The hmm_pfns array will be used for the GPU binding.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_svm.c | 61 +++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 44d4f4216a93..0c13690a19f5 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -145,3 +145,64 @@ int xe_svm_build_sg(struct hmm_range *range,
 	sg_mark_end(sg);
 	return 0;
 }
+
+/** Populate physical pages of a virtual address range
+ * This function also read mmu notifier sequence # (
+ * mmu_interval_read_begin), for the purpose of later
+ * comparison (through mmu_interval_read_retry).
+ * This must be called with mmap read or write lock held.
+ *
+ * This function alloates hmm_range->hmm_pfns, it is caller's
+ * responsibility to free it.
+ *
+ * @svm_range: The svm range to populate
+ * @hmm_range: pointer to hmm_range struct. hmm_rang->hmm_pfns
+ * will hold the populated pfns.
+ * @write: populate pages with write permission
+ *
+ * returns: 0 for succuss; negative error no on failure
+ */
+static int svm_populate_range(struct xe_svm_range *svm_range,
+			    struct hmm_range *hmm_range, bool write)
+{
+	unsigned long timeout =
+		jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
+	unsigned long *pfns, flags = HMM_PFN_REQ_FAULT;
+	u64 npages;
+	int ret;
+
+	mmap_assert_locked(svm_range->svm->mm);
+
+	npages = ((svm_range->end - 1) >> PAGE_SHIFT) -
+						(svm_range->start >> PAGE_SHIFT) + 1;
+	pfns = kvmalloc_array(npages, sizeof(*pfns), GFP_KERNEL);
+	if (unlikely(!pfns))
+		return -ENOMEM;
+
+	if (write)
+		flags |= HMM_PFN_REQ_WRITE;
+
+	memset64((u64 *)pfns, (u64)flags, npages);
+	hmm_range->hmm_pfns = pfns;
+	hmm_range->notifier_seq = mmu_interval_read_begin(&svm_range->notifier);
+	hmm_range->notifier = &svm_range->notifier;
+	hmm_range->start = svm_range->start;
+	hmm_range->end = svm_range->end;
+	hmm_range->pfn_flags_mask = HMM_PFN_REQ_FAULT | HMM_PFN_REQ_WRITE;
+	hmm_range->dev_private_owner = svm_range->svm->vm->xe->drm.dev;
+
+	while (true) {
+		ret = hmm_range_fault(hmm_range);
+		if (time_after(jiffies, timeout))
+			goto free_pfns;
+
+		if (ret == -EBUSY)
+			continue;
+		break;
+	}
+
+free_pfns:
+	if (ret)
+		kvfree(pfns);
+	return ret;
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 21/22] drm/xe/svm: GPU page fault support
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (19 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 20/22] drm/xe/svm: Populate svm range Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  2023-12-21  4:38 ` [PATCH 22/22] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

On gpu page fault of a virtual address, try to fault in the virtual
address range to gpu page table and let HW to retry on the faulty
address.

Right now, we always migrate the whole vma which contains the fault
address to GPU. This is subject to change of a more sophisticated
migration policy: decide whether to migrate memory to GPU or map
in place with CPU memory; migration granularity.

There is rather complicated locking strategy in this patch. See more
details in xe_svm_doc.h, lock design section.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/xe_gt_pagefault.c |   7 ++
 drivers/gpu/drm/xe/xe_svm.c          | 116 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_svm.h          |   6 ++
 drivers/gpu/drm/xe/xe_svm_range.c    |  43 ++++++++++
 4 files changed, 172 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c
index 6de1ff195aaa..0afd312ff154 100644
--- a/drivers/gpu/drm/xe/xe_gt_pagefault.c
+++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c
@@ -22,6 +22,7 @@
 #include "xe_pt.h"
 #include "xe_trace.h"
 #include "xe_vm.h"
+#include "xe_svm.h"
 
 enum fault_type {
 	NOT_PRESENT = 0,
@@ -131,6 +132,11 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 	if (!vm || !xe_vm_in_fault_mode(vm))
 		return -EINVAL;
 
+	if (vm->svm) {
+		ret = xe_svm_handle_gpu_fault(vm, gt, pf);
+		goto put_vm;
+	}
+
 retry_userptr:
 	/*
 	 * TODO: Avoid exclusive lock if VM doesn't have userptrs, or
@@ -219,6 +225,7 @@ static int handle_pagefault(struct xe_gt *gt, struct pagefault *pf)
 		if (ret >= 0)
 			ret = 0;
 	}
+put_vm:
 	xe_vm_put(vm);
 
 	return ret;
diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c
index 0c13690a19f5..1ade8d7f0ab2 100644
--- a/drivers/gpu/drm/xe/xe_svm.c
+++ b/drivers/gpu/drm/xe/xe_svm.c
@@ -12,6 +12,7 @@
 #include "xe_svm.h"
 #include <linux/hmm.h>
 #include <linux/scatterlist.h>
+#include <drm/xe_drm.h>
 #include "xe_pt.h"
 #include "xe_assert.h"
 #include "xe_vm_types.h"
@@ -206,3 +207,118 @@ static int svm_populate_range(struct xe_svm_range *svm_range,
 		kvfree(pfns);
 	return ret;
 }
+
+/**
+ * svm_access_allowed() -  Determine whether read or/and write to vma is allowed
+ *
+ * @write: true means a read and write access; false: read only access
+ */
+static bool svm_access_allowed(struct vm_area_struct *vma, bool write)
+{
+	unsigned long access = VM_READ;
+
+	if (write)
+		access |= VM_WRITE;
+
+	return (vma->vm_flags & access) == access;
+}
+
+/**
+ * svm_should_migrate() - Determine whether we should migrate a range to
+ * a destination memory region
+ *
+ * @range: The svm memory range to consider
+ * @dst_region: target destination memory region
+ * @is_atomic_fault: Is the intended migration triggered by a atomic access?
+ * On some platform, we have to migrate memory to guarantee atomic correctness.
+ */
+static bool svm_should_migrate(struct xe_svm_range *range,
+				struct xe_mem_region *dst_region, bool is_atomic_fault)
+{
+	return true;
+}
+
+/**
+ * xe_svm_handle_gpu_fault() - gpu page fault handler for svm subsystem
+ *
+ * @vm: The vm of the fault.
+ * @gt: The gt hardware on which the fault happens.
+ * @pf: page fault descriptor
+ *
+ * Workout a backing memory for the fault address, migrate memory from
+ * system memory to gpu vram if nessary, and map the fault address to
+ * GPU so GPU HW can retry the last operation which has caused the GPU
+ * page fault.
+ */
+int xe_svm_handle_gpu_fault(struct xe_vm *vm,
+				struct xe_gt *gt,
+				struct pagefault *pf)
+{
+	u8 access_type = pf->access_type;
+	u64 page_addr = pf->page_addr;
+	struct hmm_range hmm_range;
+	struct vm_area_struct *vma;
+	struct xe_svm_range *range;
+	struct mm_struct *mm;
+	struct xe_svm *svm;
+	int ret = 0;
+
+	svm = vm->svm;
+	if (!svm)
+		return -EINVAL;
+
+	mm = svm->mm;
+	mmap_read_lock(mm);
+	vma = find_vma_intersection(mm, page_addr, page_addr + 4);
+	if (!vma) {
+		mmap_read_unlock(mm);
+		return -ENOENT;
+	}
+
+	if (!svm_access_allowed (vma, access_type != ACCESS_TYPE_READ)) {
+		mmap_read_unlock(mm);
+		return -EPERM;
+	}
+
+	range = xe_svm_range_from_addr(svm, page_addr);
+	if (!range) {
+		range = xe_svm_range_create(svm, vma);
+		if (!range) {
+			mmap_read_unlock(mm);
+			return -ENOMEM;
+		}
+	}
+
+	if (svm_should_migrate(range, &gt->tile->mem.vram,
+						access_type == ACCESS_TYPE_ATOMIC))
+		/** Migrate whole svm range for now.
+		 *  This is subject to change once we introduce a migration granularity
+		 *  parameter for user to select.
+		 *
+		 *	Migration is best effort. If we failed to migrate to vram,
+		 *	we just map that range to gpu in system memory. For cases
+		 *	such as gpu atomic operation which requires memory to be
+		 *	resident in vram, we will fault again and retry migration.
+		 */
+		svm_migrate_range_to_vram(range, vma, gt->tile);
+
+	ret = svm_populate_range(range, &hmm_range, vma->vm_flags & VM_WRITE);
+	mmap_read_unlock(mm);
+	/** There is no need to destroy this range. Range can be reused later */
+	if (ret)
+		goto free_pfns;
+
+	/**FIXME: set the DM, AE flags in PTE*/
+	ret = xe_bind_svm_range(vm, gt->tile, &hmm_range,
+		!(vma->vm_flags & VM_WRITE) ? DRM_XE_VM_BIND_FLAG_READONLY : 0);
+	/** Concurrent cpu page table update happened,
+	 *  Return successfully so we will retry everything
+	 *  on next gpu page fault.
+	 */
+	if (ret == -EAGAIN)
+		ret = 0;
+
+free_pfns:
+	kvfree(hmm_range.hmm_pfns);
+	return ret;
+}
diff --git a/drivers/gpu/drm/xe/xe_svm.h b/drivers/gpu/drm/xe/xe_svm.h
index 659bcb7927d6..a8ff4957a9b8 100644
--- a/drivers/gpu/drm/xe/xe_svm.h
+++ b/drivers/gpu/drm/xe/xe_svm.h
@@ -20,6 +20,7 @@
 
 struct xe_vm;
 struct mm_struct;
+struct pagefault;
 
 #define XE_MAX_SVM_PROCESS 5 /* Maximumly support 32 SVM process*/
 extern DECLARE_HASHTABLE(xe_svm_table, XE_MAX_SVM_PROCESS);
@@ -94,6 +95,8 @@ bool xe_svm_range_belongs_to_vma(struct mm_struct *mm,
 void xe_svm_range_unregister_mmu_notifier(struct xe_svm_range *range);
 int xe_svm_range_register_mmu_notifier(struct xe_svm_range *range);
 void xe_svm_range_prepare_destroy(struct xe_svm_range *range);
+struct xe_svm_range *xe_svm_range_create(struct xe_svm *svm,
+									struct vm_area_struct *vma);
 
 int xe_svm_build_sg(struct hmm_range *range, struct sg_table *st);
 int xe_svm_devm_add(struct xe_tile *tile, struct xe_mem_region *mem);
@@ -106,4 +109,7 @@ int xe_devm_alloc_pages(struct xe_tile *tile,
 
 void xe_devm_free_blocks(struct list_head *blocks);
 void xe_devm_page_free(struct page *page);
+int xe_svm_handle_gpu_fault(struct xe_vm *vm,
+				struct xe_gt *gt,
+				struct pagefault *pf);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_svm_range.c b/drivers/gpu/drm/xe/xe_svm_range.c
index dfb4660dc26f..05c088dddc2d 100644
--- a/drivers/gpu/drm/xe/xe_svm_range.c
+++ b/drivers/gpu/drm/xe/xe_svm_range.c
@@ -182,3 +182,46 @@ void xe_svm_range_prepare_destroy(struct xe_svm_range *range)
 	xe_invalidate_svm_range(vm, range->start, length);
 	xe_svm_range_unregister_mmu_notifier(range);
 }
+
+static void add_range_to_svm(struct xe_svm_range *range)
+{
+	range->inode.start = range->start;
+	range->inode.last = range->end;
+	mutex_lock(&range->svm->mutex);
+	interval_tree_insert(&range->inode, &range->svm->range_tree);
+	mutex_unlock(&range->svm->mutex);
+}
+
+/**
+ * xe_svm_range_create() - create and initialize a svm range
+ *
+ * @svm: the svm that the range belongs to
+ * @vma: the corresponding vma of the range
+ *
+ * Create range, add it to svm's interval tree. Regiter a mmu
+ * interval notifier for this range.
+ *
+ * Return the pointer of the created svm range
+ * or NULL if fail
+ */
+struct xe_svm_range *xe_svm_range_create(struct xe_svm *svm,
+									struct vm_area_struct *vma)
+{
+	struct xe_svm_range *range = kzalloc(sizeof(*range), GFP_KERNEL);
+
+	if (!range)
+		return NULL;
+
+	range->start = vma->vm_start;
+	range->end = vma->vm_end;
+	range->vma = vma;
+	range->svm = svm;
+
+	if (xe_svm_range_register_mmu_notifier(range)){
+		kfree(range);
+		return NULL;
+	}
+
+	add_range_to_svm(range);
+	return range;
+}
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 22/22] drm/xe/svm: Add DRM_XE_SVM kernel config entry
  2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
                   ` (20 preceding siblings ...)
  2023-12-21  4:38 ` [PATCH 21/22] drm/xe/svm: GPU page fault support Oak Zeng
@ 2023-12-21  4:38 ` Oak Zeng
  21 siblings, 0 replies; 23+ messages in thread
From: Oak Zeng @ 2023-12-21  4:38 UTC (permalink / raw)
  To: dri-devel, intel-xe
  Cc: matthew.brost, Thomas.Hellstrom, niranjana.vishwanathapura, brian.welty

DRM_XE_SVM kernel config entry is added so
xe svm feature can be configured before kernel
compilation.

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Co-developed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Cc: Brian Welty <brian.welty@intel.com>
---
 drivers/gpu/drm/xe/Kconfig   | 22 ++++++++++++++++++++++
 drivers/gpu/drm/xe/Makefile  |  5 +++++
 drivers/gpu/drm/xe/xe_mmio.c |  5 +++++
 drivers/gpu/drm/xe/xe_vm.c   |  2 ++
 4 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/xe/Kconfig b/drivers/gpu/drm/xe/Kconfig
index 5b3da06e7ba3..a57f0972e9ae 100644
--- a/drivers/gpu/drm/xe/Kconfig
+++ b/drivers/gpu/drm/xe/Kconfig
@@ -83,6 +83,28 @@ config DRM_XE_FORCE_PROBE
 
 	  Use "!*" to block the probe of the driver for all known devices.
 
+config DRM_XE_SVM
+    bool "Enable Shared Virtual Memory support in xe"
+    depends on DRM_XE
+    depends on ARCH_ENABLE_MEMORY_HOTPLUG
+    depends on ARCH_ENABLE_MEMORY_HOTREMOVE
+    depends on MEMORY_HOTPLUG
+    depends on MEMORY_HOTREMOVE
+    depends on ARCH_HAS_PTE_DEVMAP
+    depends on SPARSEMEM_VMEMMAP
+    depends on ZONE_DEVICE
+    depends on DEVICE_PRIVATE
+    depends on MMU
+    select HMM_MIRROR
+    select MMU_NOTIFIER
+    default y
+    help
+      Choose this option if you want Shared Virtual Memory (SVM)
+      support in xe. With SVM, virtual address space is shared
+	  between CPU and GPU. This means any virtual address such
+	  as malloc or mmap returns, variables on stack, or global
+	  memory pointers, can be used for GPU transparently.
+
 menu "drm/Xe Debugging"
 depends on DRM_XE
 depends on EXPERT
diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index df8601d6a59f..b75bdbc5e42c 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -282,6 +282,11 @@ xe-$(CONFIG_DRM_XE_DISPLAY) += \
 	i915-display/skl_universal_plane.o \
 	i915-display/skl_watermark.o
 
+xe-$(CONFIG_DRM_XE_SVM) += xe_svm.o \
+						   xe_svm_devmem.o \
+						   xe_svm_range.o \
+						   xe_svm_migrate.o
+
 ifeq ($(CONFIG_ACPI),y)
 	xe-$(CONFIG_DRM_XE_DISPLAY) += \
 		i915-display/intel_acpi.o \
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c
index cfe25a3c7059..7c95f675ed92 100644
--- a/drivers/gpu/drm/xe/xe_mmio.c
+++ b/drivers/gpu/drm/xe/xe_mmio.c
@@ -286,7 +286,9 @@ int xe_mmio_probe_vram(struct xe_device *xe)
 		}
 
 		io_size -= min_t(u64, tile_size, io_size);
+#if IS_ENABLED(CONFIG_DRM_XE_SVM)
 		xe_svm_devm_add(tile, &tile->mem.vram);
+#endif
 	}
 
 	xe->mem.vram.actual_physical_size = total_size;
@@ -361,8 +363,11 @@ static void mmio_fini(struct drm_device *drm, void *arg)
 	pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
 	if (xe->mem.vram.mapping)
 		iounmap(xe->mem.vram.mapping);
+
+#if IS_ENABLED(CONFIG_DRM_XE_SVM)
 	for_each_tile(tile, xe, id) {
 		xe_svm_devm_remove(xe, &tile->mem.vram);
+#endif
 	}
 }
 
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index 3c301a5c7325..12d82f2fc195 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -1376,7 +1376,9 @@ struct xe_vm *xe_vm_create(struct xe_device *xe, u32 flags)
 		xe->usm.num_vm_in_non_fault_mode++;
 	mutex_unlock(&xe->usm.lock);
 
+#if IS_ENABLED(CONFIG_DRM_XE_SVM)
 	vm->svm = xe_create_svm(vm);
+#endif
 	trace_xe_vm_create(vm);
 
 	return vm;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-12-21  4:28 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21  4:37 [PATCH 00/22] XeKmd basic SVM support Oak Zeng
2023-12-21  4:37 ` [PATCH 01/22] drm/xe/svm: Add SVM document Oak Zeng
2023-12-21  4:37 ` [PATCH 02/22] drm/xe/svm: Add svm key data structures Oak Zeng
2023-12-21  4:37 ` [PATCH 03/22] drm/xe/svm: create xe svm during vm creation Oak Zeng
2023-12-21  4:37 ` [PATCH 04/22] drm/xe/svm: Trace svm creation Oak Zeng
2023-12-21  4:37 ` [PATCH 05/22] drm/xe/svm: add helper to retrieve svm range from address Oak Zeng
2023-12-21  4:37 ` [PATCH 06/22] drm/xe/svm: Introduce a helper to build sg table from hmm range Oak Zeng
2023-12-21  4:37 ` [PATCH 07/22] drm/xe/svm: Add helper for binding hmm range to gpu Oak Zeng
2023-12-21  4:37 ` [PATCH 08/22] drm/xe/svm: Add helper to invalidate svm range from GPU Oak Zeng
2023-12-21  4:37 ` [PATCH 09/22] drm/xe/svm: Remap and provide memmap backing for GPU vram Oak Zeng
2023-12-21  4:38 ` [PATCH 10/22] drm/xe/svm: Introduce svm migration function Oak Zeng
2023-12-21  4:38 ` [PATCH 11/22] drm/xe/svm: implement functions to allocate and free device memory Oak Zeng
2023-12-21  4:38 ` [PATCH 12/22] drm/xe/svm: Trace buddy block allocation and free Oak Zeng
2023-12-21  4:38 ` [PATCH 13/22] drm/xe/svm: Handle CPU page fault Oak Zeng
2023-12-21  4:38 ` [PATCH 14/22] drm/xe/svm: trace svm range migration Oak Zeng
2023-12-21  4:38 ` [PATCH 15/22] drm/xe/svm: Implement functions to register and unregister mmu notifier Oak Zeng
2023-12-21  4:38 ` [PATCH 16/22] drm/xe/svm: Implement the mmu notifier range invalidate callback Oak Zeng
2023-12-21  4:38 ` [PATCH 17/22] drm/xe/svm: clean up svm range during process exit Oak Zeng
2023-12-21  4:38 ` [PATCH 18/22] drm/xe/svm: Move a few structures to xe_gt.h Oak Zeng
2023-12-21  4:38 ` [PATCH 19/22] drm/xe/svm: migrate svm range to vram Oak Zeng
2023-12-21  4:38 ` [PATCH 20/22] drm/xe/svm: Populate svm range Oak Zeng
2023-12-21  4:38 ` [PATCH 21/22] drm/xe/svm: GPU page fault support Oak Zeng
2023-12-21  4:38 ` [PATCH 22/22] drm/xe/svm: Add DRM_XE_SVM kernel config entry Oak Zeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).