All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm
@ 2021-08-19 13:36 David Yat Sin
  2021-08-19 13:36 ` [PATCH 01/18] x86/configs: CRIU update release defconfig David Yat Sin
                   ` (17 more replies)
  0 siblings, 18 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:36 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

CRIU is a user space tool which is very popular for container live migration in datacentres. It can checkpoint a running application, save its complete state, memory contents and all system resources to images on disk which can be migrated to another m
achine and restored later. More information on CRIU can be found at https://criu.org/Main_Page

CRIU currently does not support Checkpoint / Restore with applications that have devices files open so it cannot perform checkpoint and restore on GPU devices which are very complex and have their own VRAM managed privately. CRIU, however can support e
xternal devices by using a plugin architecture. This patch series adds initial support for ROCm applications while we add more remaining features. We welcome some feedback, especially in regards to the APIs, before involving a larger audience.

Our plugin code can be found at https://github.com/RadeonOpenCompute/criu/tree/criu-dev/plugins/amdgpu

We have tested the following scenarios:
-Checkpoint / Restore of a Pytorch (BERT) workload
-kfdtests with queues and events
-Gfx9 and Gfx10 based multi GPU test systems
-On baremetal and inside a docker container
-Restoring on a different system

David Yat Sin (9):
  drm/amdkfd: CRIU Implement KFD pause ioctl
  drm/amdkfd: CRIU add queues support
  drm/amdkfd: CRIU restore queue ids
  drm/amdkfd: CRIU restore sdma id for queues
  drm/amdkfd: CRIU restore queue doorbell id
  drm/amdkfd: CRIU dump and restore queue mqds
  drm/amdkfd: CRIU dump/restore queue control stack
  drm/amdkfd: CRIU dump and restore events
  drm/amdkfd: CRIU implement gpu_id remapping

Rajneesh Bhardwaj (9):
  x86/configs: CRIU update release defconfig
  x86/configs: CRIU update debug rock defconfig
  drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
  drm/amdkfd: CRIU Implement KFD process_info ioctl
  drm/amdkfd: CRIU Implement KFD dumper ioctl
  drm/amdkfd: CRIU Implement KFD restore ioctl
  drm/amdkfd: CRIU Implement KFD resume ioctl
  Revert "drm/amdgpu: Remove verify_access shortcut for KFD BOs"
  drm/amdkfd: CRIU export kfd bos as prime dmabuf objects

 arch/x86/configs/rock-dbg_defconfig           |   53 +-
 arch/x86/configs/rock-rel_defconfig           |   13 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |    5 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   51 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |   27 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h       |    2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 1730 +++++++++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c       |    2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |  187 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |   14 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c       |  254 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |   11 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  |   76 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  |   78 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   |   86 +
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   |   77 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  140 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      |   69 +-
 .../amd/amdkfd/kfd_process_queue_manager.c    |   72 +-
 include/uapi/linux/kfd_ioctl.h                |  110 +-
 20 files changed, 2743 insertions(+), 314 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/18] x86/configs: CRIU update release defconfig
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
@ 2021-08-19 13:36 ` David Yat Sin
  2021-08-19 13:36 ` [PATCH 02/18] x86/configs: CRIU update debug rock defconfig David Yat Sin
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:36 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

Update rock-rel_defconfig for monolithic kernel release that enables
CRIU support with kfd.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
(cherry picked from commit 4a6d309a82648a23a4fc0add83013ac6db6187d5)
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 arch/x86/configs/rock-rel_defconfig | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/configs/rock-rel_defconfig b/arch/x86/configs/rock-rel_defconfig
index 16fe62276006..9c46bb890879 100644
--- a/arch/x86/configs/rock-rel_defconfig
+++ b/arch/x86/configs/rock-rel_defconfig
@@ -1045,6 +1045,11 @@ CONFIG_PACKET_DIAG=m
 CONFIG_UNIX=y
 CONFIG_UNIX_SCM=y
 CONFIG_UNIX_DIAG=m
+CONFIG_SMC_DIAG=y
+CONFIG_XDP_SOCKETS_DIAG=y
+CONFIG_INET_MPTCP_DIAG=y
+CONFIG_TIPC_DIAG=y
+CONFIG_VSOCKETS_DIAG=y
 # CONFIG_TLS is not set
 CONFIG_XFRM=y
 CONFIG_XFRM_ALGO=m
@@ -1089,7 +1094,7 @@ CONFIG_NET_FOU=m
 CONFIG_NET_FOU_IP_TUNNELS=y
 CONFIG_INET_AH=m
 CONFIG_INET_ESP=m
-# CONFIG_INET_ESP_OFFLOAD is not set
+CONFIG_INET_ESP_OFFLOAD=m
 # CONFIG_INET_ESPINTCP is not set
 CONFIG_INET_IPCOMP=m
 CONFIG_INET_XFRM_TUNNEL=m
@@ -1097,8 +1102,8 @@ CONFIG_INET_TUNNEL=m
 CONFIG_INET_DIAG=m
 CONFIG_INET_TCP_DIAG=m
 CONFIG_INET_UDP_DIAG=m
-# CONFIG_INET_RAW_DIAG is not set
-# CONFIG_INET_DIAG_DESTROY is not set
+CONFIG_INET_RAW_DIAG=m
+CONFIG_INET_DIAG_DESTROY=m
 CONFIG_TCP_CONG_ADVANCED=y
 CONFIG_TCP_CONG_BIC=m
 CONFIG_TCP_CONG_CUBIC=y
@@ -1126,7 +1131,7 @@ CONFIG_IPV6_ROUTE_INFO=y
 # CONFIG_IPV6_OPTIMISTIC_DAD is not set
 CONFIG_INET6_AH=m
 CONFIG_INET6_ESP=m
-# CONFIG_INET6_ESP_OFFLOAD is not set
+CONFIG_INET6_ESP_OFFLOAD=m
 # CONFIG_INET6_ESPINTCP is not set
 CONFIG_INET6_IPCOMP=m
 CONFIG_IPV6_MIP6=m
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/18] x86/configs: CRIU update debug rock defconfig
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
  2021-08-19 13:36 ` [PATCH 01/18] x86/configs: CRIU update release defconfig David Yat Sin
@ 2021-08-19 13:36 ` David Yat Sin
  2021-08-19 13:36 ` [PATCH 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs David Yat Sin
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:36 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

 - Update debug config for Checkpoint-Restore (CR) support
 - Also include necessary options for CR with docker containers.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 arch/x86/configs/rock-dbg_defconfig | 53 ++++++++++++++++++-----------
 1 file changed, 34 insertions(+), 19 deletions(-)

diff --git a/arch/x86/configs/rock-dbg_defconfig b/arch/x86/configs/rock-dbg_defconfig
index 54688993d6e2..87951da7de6a 100644
--- a/arch/x86/configs/rock-dbg_defconfig
+++ b/arch/x86/configs/rock-dbg_defconfig
@@ -236,6 +236,7 @@ CONFIG_BPF_SYSCALL=y
 CONFIG_ARCH_WANT_DEFAULT_BPF_JIT=y
 # CONFIG_BPF_PRELOAD is not set
 # CONFIG_USERFAULTFD is not set
+CONFIG_USERFAULTFD=y
 CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE=y
 CONFIG_KCMP=y
 CONFIG_RSEQ=y
@@ -994,6 +995,11 @@ CONFIG_PACKET_DIAG=y
 CONFIG_UNIX=y
 CONFIG_UNIX_SCM=y
 CONFIG_UNIX_DIAG=y
+CONFIG_SMC_DIAG=y
+CONFIG_XDP_SOCKETS_DIAG=y
+CONFIG_INET_MPTCP_DIAG=y
+CONFIG_TIPC_DIAG=y
+CONFIG_VSOCKETS_DIAG=y
 # CONFIG_TLS is not set
 CONFIG_XFRM=y
 CONFIG_XFRM_ALGO=y
@@ -1031,15 +1037,17 @@ CONFIG_SYN_COOKIES=y
 # CONFIG_NET_IPVTI is not set
 # CONFIG_NET_FOU is not set
 # CONFIG_NET_FOU_IP_TUNNELS is not set
-# CONFIG_INET_AH is not set
-# CONFIG_INET_ESP is not set
-# CONFIG_INET_IPCOMP is not set
-CONFIG_INET_TUNNEL=y
-CONFIG_INET_DIAG=y
-CONFIG_INET_TCP_DIAG=y
-# CONFIG_INET_UDP_DIAG is not set
-# CONFIG_INET_RAW_DIAG is not set
-# CONFIG_INET_DIAG_DESTROY is not set
+CONFIG_INET_AH=m
+CONFIG_INET_ESP=m
+CONFIG_INET_IPCOMP=m
+CONFIG_INET_ESP_OFFLOAD=m
+CONFIG_INET_TUNNEL=m
+CONFIG_INET_XFRM_TUNNEL=m
+CONFIG_INET_DIAG=m
+CONFIG_INET_TCP_DIAG=m
+CONFIG_INET_UDP_DIAG=m
+CONFIG_INET_RAW_DIAG=m
+CONFIG_INET_DIAG_DESTROY=y
 CONFIG_TCP_CONG_ADVANCED=y
 # CONFIG_TCP_CONG_BIC is not set
 CONFIG_TCP_CONG_CUBIC=y
@@ -1064,12 +1072,14 @@ CONFIG_TCP_MD5SIG=y
 CONFIG_IPV6=y
 # CONFIG_IPV6_ROUTER_PREF is not set
 # CONFIG_IPV6_OPTIMISTIC_DAD is not set
-CONFIG_INET6_AH=y
-CONFIG_INET6_ESP=y
-# CONFIG_INET6_ESP_OFFLOAD is not set
-# CONFIG_INET6_ESPINTCP is not set
-# CONFIG_INET6_IPCOMP is not set
-# CONFIG_IPV6_MIP6 is not set
+CONFIG_INET6_AH=m
+CONFIG_INET6_ESP=m
+CONFIG_INET6_ESP_OFFLOAD=m
+CONFIG_INET6_IPCOMP=m
+CONFIG_IPV6_MIP6=m
+CONFIG_INET6_XFRM_TUNNEL=m
+CONFIG_INET_DCCP_DIAG=m
+CONFIG_INET_SCTP_DIAG=m
 # CONFIG_IPV6_ILA is not set
 # CONFIG_IPV6_VTI is not set
 CONFIG_IPV6_SIT=y
@@ -1126,8 +1136,13 @@ CONFIG_NF_CT_PROTO_UDPLITE=y
 # CONFIG_NF_CONNTRACK_SANE is not set
 # CONFIG_NF_CONNTRACK_SIP is not set
 # CONFIG_NF_CONNTRACK_TFTP is not set
-# CONFIG_NF_CT_NETLINK is not set
-# CONFIG_NF_CT_NETLINK_TIMEOUT is not set
+CONFIG_COMPAT_NETLINK_MESSAGES=y
+CONFIG_NF_CT_NETLINK=m
+CONFIG_NF_CT_NETLINK_TIMEOUT=m
+CONFIG_NF_CT_NETLINK_HELPER=m
+CONFIG_NETFILTER_NETLINK_GLUE_CT=y
+CONFIG_SCSI_NETLINK=y
+CONFIG_QUOTA_NETLINK_INTERFACE=y
 CONFIG_NF_NAT=m
 CONFIG_NF_NAT_REDIRECT=y
 CONFIG_NF_NAT_MASQUERADE=y
@@ -1971,7 +1986,7 @@ CONFIG_NETCONSOLE_DYNAMIC=y
 CONFIG_NETPOLL=y
 CONFIG_NET_POLL_CONTROLLER=y
 # CONFIG_RIONET is not set
-# CONFIG_TUN is not set
+CONFIG_TUN=y
 # CONFIG_TUN_VNET_CROSS_LE is not set
 CONFIG_VETH=y
 # CONFIG_NLMON is not set
@@ -3955,7 +3970,7 @@ CONFIG_MANDATORY_FILE_LOCKING=y
 CONFIG_FSNOTIFY=y
 CONFIG_DNOTIFY=y
 CONFIG_INOTIFY_USER=y
-# CONFIG_FANOTIFY is not set
+CONFIG_FANOTIFY=y
 CONFIG_QUOTA=y
 CONFIG_QUOTA_NETLINK_INTERFACE=y
 # CONFIG_PRINT_QUOTA_WARNING is not set
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
  2021-08-19 13:36 ` [PATCH 01/18] x86/configs: CRIU update release defconfig David Yat Sin
  2021-08-19 13:36 ` [PATCH 02/18] x86/configs: CRIU update debug rock defconfig David Yat Sin
@ 2021-08-19 13:36 ` David Yat Sin
  2021-08-23 18:57   ` Felix Kuehling
  2021-08-19 13:36 ` [PATCH 04/18] drm/amdkfd: CRIU Implement KFD process_info ioctl David Yat Sin
                   ` (14 subsequent siblings)
  17 siblings, 1 reply; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:36 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
snapshot a running process and later restore it on same or a remote
machine but expects the processes that have a device file (e.g. GPU)
associated with them, provide necessary driver support to assist CRIU
and its extensible plugin interface. Thus, In order to support the
Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute
Kernel driver, needs to provide a set of new APIs that provide
necessary VRAM metadata and its contents to a userspace component
(CRIU plugin) that can store it in form of image files.

This introduces some new ioctls which will be used to checkpoint-Restore
any KFD bound user process. KFD doesn't allow any arbitrary ioctl call
unless it is called by the group leader process. Since these ioctls are
expected to be called from a KFD criu plugin which has elevated ptrace
attached privileges and CAP_SYS_ADMIN capabilities attached with the file
descriptors so modify KFD to allow such calls.

(API redesign suggested by Felix Kuehling and implemented by David Yat
Sin)

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
(cherry picked from commit 72f4907135aed9c037b9f442a6055b51733b518a)
(cherry picked from commit 33ff4953c5352f51d57a77ba8ae6614b7993e70d)
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  70 ++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  69 ++++++++++++++
 include/uapi/linux/kfd_ioctl.h           | 110 ++++++++++++++++++++++-
 3 files changed, 247 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 059c3f1ca27d..a1b60d29aae1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -33,6 +33,7 @@
 #include <linux/time.h>
 #include <linux/mm.h>
 #include <linux/mman.h>
+#include <linux/ptrace.h>
 #include <linux/dma-buf.h>
 #include <asm/processor.h>
 #include "kfd_priv.h"
@@ -1802,6 +1803,44 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
 	return -EPERM;
 }
 #endif
+static int kfd_ioctl_criu_dumper(struct file *filep,
+				struct kfd_process *p, void *data)
+{
+	pr_debug("Inside %s\n", __func__);
+
+	return 0;
+}
+
+static int kfd_ioctl_criu_restorer(struct file *filep,
+				struct kfd_process *p, void *data)
+{
+	pr_debug("Inside %s\n", __func__);
+
+	return 0;
+}
+
+static int kfd_ioctl_criu_pause(struct file *filep, struct kfd_process *p, void *data)
+{
+	pr_debug("Inside %s\n", __func__);
+
+	return 0;
+}
+
+static int kfd_ioctl_criu_resume(struct file *filep,
+				struct kfd_process *p, void *data)
+{
+	pr_debug("Inside %s\n", __func__);
+
+	return 0;
+}
+
+static int kfd_ioctl_criu_process_info(struct file *filep,
+				struct kfd_process *p, void *data)
+{
+	pr_debug("Inside %s\n", __func__);
+
+	return 0;
+}
 
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
 	[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
@@ -1906,6 +1945,21 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
 	AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_XNACK_MODE,
 			kfd_ioctl_set_xnack_mode, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_DUMPER,
+			 kfd_ioctl_criu_dumper, KFD_IOC_FLAG_PTRACE_ATTACHED),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_RESTORER,
+			 kfd_ioctl_criu_restorer, KFD_IOC_FLAG_ROOT_ONLY),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_PROCESS_INFO,
+			 kfd_ioctl_criu_process_info, KFD_IOC_FLAG_PTRACE_ATTACHED),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_RESUME,
+			 kfd_ioctl_criu_resume, KFD_IOC_FLAG_ROOT_ONLY),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_PAUSE,
+			 kfd_ioctl_criu_pause, KFD_IOC_FLAG_PTRACE_ATTACHED),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNT	ARRAY_SIZE(amdkfd_ioctls)
@@ -1920,6 +1974,7 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	char *kdata = NULL;
 	unsigned int usize, asize;
 	int retcode = -EINVAL;
+	bool ptrace_attached = false;
 
 	if (nr >= AMDKFD_CORE_IOCTL_COUNT)
 		goto err_i1;
@@ -1945,7 +2000,15 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 	 * processes need to create their own KFD device context.
 	 */
 	process = filep->private_data;
-	if (process->lead_thread != current->group_leader) {
+
+	rcu_read_lock();
+	if ((ioctl->flags & KFD_IOC_FLAG_PTRACE_ATTACHED) &&
+	    ptrace_parent(process->lead_thread) == current)
+		ptrace_attached = true;
+	rcu_read_unlock();
+
+	if (process->lead_thread != current->group_leader
+	    && !ptrace_attached) {
 		dev_dbg(kfd_device, "Using KFD FD in wrong process\n");
 		retcode = -EBADF;
 		goto err_i1;
@@ -1960,6 +2023,11 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 		goto err_i1;
 	}
 
+	/* KFD_IOC_FLAG_ROOT_ONLY is only for CAP_SYS_ADMIN */
+	if (unlikely((ioctl->flags & KFD_IOC_FLAG_ROOT_ONLY) &&
+		     !capable(CAP_SYS_ADMIN)))
+		return -EACCES;
+
 	if (cmd & (IOC_IN | IOC_OUT)) {
 		if (asize <= sizeof(stack_kdata)) {
 			kdata = stack_kdata;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 64552f6b8ba4..768cc3fe95d2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -121,7 +121,35 @@
  */
 #define KFD_QUEUE_DOORBELL_MIRROR_OFFSET 512
 
+/**
+ * enum kfd_ioctl_flags - KFD ioctl flags
+ * Various flags that can be set in &amdkfd_ioctl_desc.flags to control how
+ * userspace can use a given ioctl.
+ */
+enum kfd_ioctl_flags {
+	/**
+	 * @KFD_IOC_FLAG_ROOT_ONLY:
+	 * Certain KFD ioctls such as AMDKFD_IOC_CRIU_RESTORER can potentially
+	 * perform privileged operations and load arbitrary data into MQDs and
+	 * eventually HQD registers when the queue is mapped by HWS. In order to
+	 * prevent this we should perform additional security checks. In other
+	 * cases, certain ioctls such as AMDKFD_IOC_CRIU_RESUME might be called
+	 * by an external process e.g. CRIU restore process, for each resuming
+	 * tasks and thus require elevated privileges.
+	 *
+	 * This is equivalent to callers with the SYSADMIN capability.
+	 */
+	KFD_IOC_FLAG_ROOT_ONLY = BIT(0),
+	/**
+	 * @KFD_IOC_FLAG_PTRACE_ATTACHED:
+	 * Certain KFD ioctls such as AMDKFD_IOC_CRIU_HELPER and
+	 * AMDKFD_IOC_CRIU_DUMPER are expected to be called during a Checkpoint
+	 * operation triggered by CRIU. Since, these are expected to be called
+	 * from a PTRACE attached context, we must authenticate these.
+	 */
+	KFD_IOC_FLAG_PTRACE_ATTACHED = BIT(1),
 
+};
 /*
  * Kernel module parameter to specify maximum number of supported queues per
  * device
@@ -977,6 +1005,47 @@ void kfd_process_set_trap_handler(struct qcm_process_device *qpd,
 				  uint64_t tba_addr,
 				  uint64_t tma_addr);
 
+/* CRIU */
+/*
+ * Need to increment KFD_CRIU_PRIV_VERSION each time a change is made to any of the CRIU private
+ * structures:
+ * kfd_criu_process_priv_data
+ * kfd_criu_device_priv_data
+ * kfd_criu_bo_priv_data
+ * kfd_criu_queue_priv_data
+ * kfd_criu_event_priv_data
+ * kfd_criu_svm_range_priv_data
+ */
+
+#define KFD_CRIU_PRIV_VERSION 1
+
+struct kfd_criu_process_priv_data {
+	uint32_t version;
+};
+
+struct kfd_criu_device_priv_data {
+	/* For future use */
+	uint64_t reserved;
+};
+
+struct kfd_criu_bo_priv_data {
+	uint64_t reserved;
+};
+
+struct kfd_criu_svm_range_priv_data {
+	uint64_t reserved;
+};
+
+struct kfd_criu_queue_priv_data {
+	uint64_t reserved;
+};
+
+struct kfd_criu_event_priv_data {
+	uint64_t reserved;
+};
+
+/* CRIU - End */
+
 /* Queue Context Management */
 int init_queue(struct queue **q, const struct queue_properties *properties);
 void uninit_queue(struct queue *q);
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 3cb5b5dd9f77..19489e2ca58e 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -467,6 +467,99 @@ struct kfd_ioctl_smi_events_args {
 	__u32 anon_fd;	/* from KFD */
 };
 
+struct kfd_criu_process_bucket {
+	__u64 priv_data_offset;
+	__u64 priv_data_size;
+};
+
+struct kfd_criu_device_bucket {
+	__u64 priv_data_offset;
+	__u64 priv_data_size;
+	__u32 user_gpu_id;
+	__u32 actual_gpu_id;
+	__u32 drm_fd;
+	__u32 pad;
+};
+
+struct kfd_criu_bo_bucket {
+	__u64 priv_data_offset;
+	__u64 priv_data_size;
+	__u64 addr;
+	__u64 size;
+	__u64 offset;
+	__u64 restored_offset;
+	__u32 gpu_id;
+	__u32 alloc_flags;
+	__u32 dmabuf_fd;
+	__u32 pad;
+};
+
+struct kfd_criu_queue_bucket {
+	__u64 priv_data_offset;
+	__u64 priv_data_size;
+	__u32 gpu_id;
+	__u32 pad;
+};
+
+struct kfd_criu_event_bucket {
+	__u64 priv_data_offset;
+	__u64 priv_data_size;
+	__u32 gpu_id;
+	__u32 pad;
+};
+
+struct kfd_ioctl_criu_process_info_args {
+	__u64 process_priv_data_size;
+	__u64 bos_priv_data_size;
+	__u64 devices_priv_data_size;
+	__u64 queues_priv_data_size;
+	__u64 events_priv_data_size;
+	__u64 svm_ranges_priv_data_size;
+	__u64 total_bos;
+	__u64 total_svm_ranges;
+	__u32 total_devices;
+	__u32 total_queues;
+	__u32 total_events;
+	__u32 task_pid;
+};
+
+struct kfd_ioctl_criu_pause_args {
+	__u32 pause;
+	__u32 pad;
+};
+
+enum kfd_criu_object_type {
+	KFD_CRIU_OBJECT_TYPE_PROCESS	= 0,
+	KFD_CRIU_OBJECT_TYPE_DEVICE	= 1,
+	KFD_CRIU_OBJECT_TYPE_BO		= 2,
+	KFD_CRIU_OBJECT_TYPE_QUEUE	= 3,
+	KFD_CRIU_OBJECT_TYPE_EVENT	= 4,
+	KFD_CRIU_OBJECT_TYPE_SVM_RANGE	= 5,
+};
+
+struct kfd_ioctl_criu_dumper_args {
+	__u64 num_objects;
+	__u64 objects;
+	__u64 objects_size;
+	__u64 objects_index_start;
+	__u32 type; /* enum kfd_criu_object_type */
+	__u32 pad;
+};
+
+struct kfd_ioctl_criu_restorer_args {
+	__u64 num_objects;
+	__u64 objects;
+	__u64 objects_size;
+	__u64 objects_index_start;
+	__u32 type; /* enum kfd_criu_object_type */
+	__u32 pad;
+};
+
+struct kfd_ioctl_criu_resume_args {
+	__u32 pid;	/* to KFD */
+	__u32 pad;
+};
+
 /* Register offset inside the remapped mmio page
  */
 enum kfd_mmio_remap {
@@ -740,7 +833,22 @@ struct kfd_ioctl_set_xnack_mode_args {
 #define AMDKFD_IOC_SET_XNACK_MODE		\
 		AMDKFD_IOWR(0x21, struct kfd_ioctl_set_xnack_mode_args)
 
+#define AMDKFD_IOC_CRIU_DUMPER			\
+		AMDKFD_IOWR(0x22, struct kfd_ioctl_criu_dumper_args)
+
+#define AMDKFD_IOC_CRIU_RESTORER		\
+		AMDKFD_IOWR(0x23, struct kfd_ioctl_criu_restorer_args)
+
+#define AMDKFD_IOC_CRIU_PROCESS_INFO		\
+		AMDKFD_IOWR(0x24, struct kfd_ioctl_criu_process_info_args)
+
+#define AMDKFD_IOC_CRIU_RESUME			\
+		AMDKFD_IOWR(0x25, struct kfd_ioctl_criu_resume_args)
+
+#define AMDKFD_IOC_CRIU_PAUSE			\
+		AMDKFD_IOWR(0x26, struct kfd_ioctl_criu_pause_args)
+
 #define AMDKFD_COMMAND_START		0x01
-#define AMDKFD_COMMAND_END		0x22
+#define AMDKFD_COMMAND_END		0x27
 
 #endif
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/18] drm/amdkfd: CRIU Implement KFD process_info ioctl
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (2 preceding siblings ...)
  2021-08-19 13:36 ` [PATCH 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs David Yat Sin
@ 2021-08-19 13:36 ` David Yat Sin
  2021-08-19 13:37 ` [PATCH 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl David Yat Sin
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:36 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

This IOCTL is expected to be called as a precursor to the actual
Checkpoint operation. This does the basic discovery into the target
process seized by CRIU and relays the information to the userspace that
utilizes it to start the Checkpoint operation via another dedicated
IOCTL.

The process_info IOCTL determines the number of GPUs, buffer objects that
are associated with the target process, its process id in caller's
namespace since /proc/pid/mem interface maybe used to drain the contents of
the discovered buffer objects in userspace and getpid returns the pid of
CRIU dumper process. Also the pid of a process inside a container might
be different than its global pid so return the ns pid.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
(cherry picked from commit b2fa92d0a8f1de51013cd6742b4996b38c285ffc)
(cherry picked from commit 8b44c466ce53162603cd8ae49624462902541a47)
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++++++++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 14 ++++++++
 3 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index a1b60d29aae1..09e2d30515e2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1803,6 +1803,27 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
 	return -EPERM;
 }
 #endif
+
+uint64_t get_process_num_bos(struct kfd_process *p)
+{
+	uint64_t num_of_bos = 0, i;
+
+	/* Run over all PDDs of the process */
+	for (i = 0; i < p->n_pdds; i++) {
+		struct kfd_process_device *pdd = p->pdds[i];
+		void *mem;
+		int id;
+
+		idr_for_each_entry(&pdd->alloc_idr, mem, id) {
+			struct kgd_mem *kgd_mem = (struct kgd_mem *)mem;
+
+			if ((uint64_t)kgd_mem->va > pdd->gpuvm_base)
+				num_of_bos++;
+		}
+	}
+	return num_of_bos;
+}
+
 static int kfd_ioctl_criu_dumper(struct file *filep,
 				struct kfd_process *p, void *data)
 {
@@ -1837,9 +1858,30 @@ static int kfd_ioctl_criu_resume(struct file *filep,
 static int kfd_ioctl_criu_process_info(struct file *filep,
 				struct kfd_process *p, void *data)
 {
+	struct kfd_ioctl_criu_process_info_args *args = data;
+	int ret = 0;
+
 	pr_debug("Inside %s\n", __func__);
+	mutex_lock(&p->mutex);
 
-	return 0;
+	if (!kfd_has_process_device_data(p)) {
+		pr_err("No pdd for given process\n");
+		ret = -ENODEV;
+		goto err_unlock;
+	}
+
+	args->task_pid = task_pid_nr_ns(p->lead_thread,
+					task_active_pid_ns(p->lead_thread));
+
+	args->process_priv_data_size = sizeof(struct kfd_criu_process_priv_data);
+
+	args->total_bos = get_process_num_bos(p);
+	args->bos_priv_data_size = args->total_bos * sizeof(struct kfd_criu_bo_priv_data);
+
+	dev_dbg(kfd_device, "Num of bos:%llu\n", args->total_bos);
+err_unlock:
+	mutex_unlock(&p->mutex);
+	return ret;
 }
 
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 768cc3fe95d2..4e390006b4b6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -932,6 +932,8 @@ void *kfd_process_device_translate_handle(struct kfd_process_device *p,
 void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
 					int handle);
 
+bool kfd_has_process_device_data(struct kfd_process *p);
+
 /* PASIDs */
 int kfd_pasid_init(void);
 void kfd_pasid_exit(void);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 9d4f527bda7c..bc133c3789d8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1359,6 +1359,20 @@ static int init_doorbell_bitmap(struct qcm_process_device *qpd,
 	return 0;
 }
 
+bool kfd_has_process_device_data(struct kfd_process *p)
+{
+	int i;
+
+	for (i = 0; i < p->n_pdds; i++) {
+		struct kfd_process_device *pdd = p->pdds[i];
+
+		if (pdd)
+			return true;
+	}
+
+	return false;
+}
+
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
 							struct kfd_process *p)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (3 preceding siblings ...)
  2021-08-19 13:36 ` [PATCH 04/18] drm/amdkfd: CRIU Implement KFD process_info ioctl David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-23 18:53   ` Felix Kuehling
  2021-08-19 13:37 ` [PATCH 06/18] drm/amdkfd: CRIU Implement KFD restore ioctl David Yat Sin
                   ` (12 subsequent siblings)
  17 siblings, 1 reply; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

This adds support to discover the  buffer objects that belong to a
process being checkpointed. The data corresponding to these buffer
objects is returned to user space plugin running under criu master
context which then stores this info to recreate these buffer objects
during a restore operation.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
(cherry picked from commit 1f114a541bd21873de905db64bb9efa673274d4b)
(cherry picked from commit 20c435fad57d3201e5402e38ae778f1f0f84a09d)
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  20 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 182 ++++++++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |   3 +-
 4 files changed, 204 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 7e7d8330d64b..99ea29fd12bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1181,6 +1181,26 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev,
 	return ttm_pool_free(&adev->mman.bdev.pool, ttm);
 }
 
+/**
+ * amdgpu_ttm_tt_get_userptr - Return the userptr GTT ttm_tt for the current
+ * task
+ *
+ * @tbo: The ttm_buffer_object that contains the userptr
+ * @user_addr:  The returned value
+ */
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+			      uint64_t *user_addr)
+{
+	struct amdgpu_ttm_tt *gtt;
+
+	if (!tbo->ttm)
+		return -EINVAL;
+
+	gtt = (void *)tbo->ttm;
+	*user_addr = gtt->userptr;
+	return 0;
+}
+
 /**
  * amdgpu_ttm_tt_set_userptr - Initialize userptr GTT ttm_tt for the current
  * task
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
index 9e38475e0f8d..dddd76f7a92e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
@@ -168,6 +168,8 @@ static inline bool amdgpu_ttm_tt_get_user_pages_done(struct ttm_tt *ttm)
 #endif
 
 void amdgpu_ttm_tt_set_user_pages(struct ttm_tt *ttm, struct page **pages);
+int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
+			      uint64_t *user_addr);
 int amdgpu_ttm_tt_set_userptr(struct ttm_buffer_object *bo,
 			      uint64_t addr, uint32_t flags);
 bool amdgpu_ttm_tt_has_userptr(struct ttm_tt *ttm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 09e2d30515e2..d548e6691d69 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -42,6 +42,7 @@
 #include "kfd_svm.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
+#include "amdgpu_object.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1804,6 +1805,44 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
 }
 #endif
 
+static int criu_dump_process(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
+{
+	int ret;
+	struct kfd_criu_process_bucket *process_bucket;
+	struct kfd_criu_process_priv_data *process_priv;
+
+	if (args->num_objects != 1) {
+		pr_err("Only 1 process supported\n");
+		return -EINVAL;
+	}
+
+	if (args->objects_size != sizeof(*process_bucket) + sizeof(*process_priv)) {
+		pr_err("Invalid objects size for process\n");
+		return -EINVAL;
+	}
+
+	process_bucket = kzalloc(args->objects_size, GFP_KERNEL);
+	if (!process_bucket)
+		return -ENOMEM;
+
+	/* Private data starts after process bucket */
+	process_priv = (void *)(process_bucket + 1);
+
+	process_priv->version = KFD_CRIU_PRIV_VERSION;
+
+	process_bucket->priv_data_offset = 0;
+	process_bucket->priv_data_size = sizeof(*process_priv);
+
+	ret = copy_to_user((void __user *)args->objects, process_bucket, args->objects_size);
+	if (ret) {
+		pr_err("Failed to copy process information to user\n");
+		ret = -EFAULT;
+	}
+
+	kfree(process_bucket);
+	return ret;
+}
+
 uint64_t get_process_num_bos(struct kfd_process *p)
 {
 	uint64_t num_of_bos = 0, i;
@@ -1824,12 +1863,151 @@ uint64_t get_process_num_bos(struct kfd_process *p)
 	return num_of_bos;
 }
 
+static int criu_dump_bos(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
+{
+	struct kfd_criu_bo_bucket *bo_buckets;
+	struct kfd_criu_bo_priv_data *bo_privs;
+	uint64_t num_bos;
+
+	int ret = 0, pdd_index, bo_index = 0, id;
+	void *mem;
+
+	num_bos = get_process_num_bos(p);
+
+	if (args->num_objects != num_bos) {
+		pr_err("Mismatch with number of BOs (current:%lld user:%lld)\n",
+				num_bos, args->num_objects);
+		return -EINVAL;
+	}
+
+	if (args->objects_size != args->num_objects * (sizeof(*bo_buckets) + sizeof(*bo_privs))) {
+		pr_err("Invalid objects size for BOs\n");
+		return -EINVAL;
+	}
+
+	bo_buckets = kvzalloc(args->objects_size, GFP_KERNEL);
+	if (!bo_buckets)
+		return -ENOMEM;
+
+	/* Private data for first BO starts after all bo_buckets */
+	bo_privs = (void *)(bo_buckets + args->num_objects);
+
+	for (pdd_index = 0; pdd_index < p->n_pdds; pdd_index++) {
+		struct kfd_process_device *pdd = p->pdds[pdd_index];
+		struct amdgpu_bo *dumper_bo;
+		struct kgd_mem *kgd_mem;
+
+		idr_for_each_entry(&pdd->alloc_idr, mem, id) {
+			struct kfd_criu_bo_bucket *bo_bucket;
+			struct kfd_criu_bo_priv_data *bo_priv;
+
+			if (!mem) {
+				ret = -ENOMEM;
+				goto exit;
+			}
+
+			kgd_mem = (struct kgd_mem *)mem;
+			dumper_bo = kgd_mem->bo;
+
+			if ((uint64_t)kgd_mem->va <= pdd->gpuvm_base)
+				continue;
+
+			bo_bucket = &bo_buckets[bo_index];
+			bo_priv = &bo_privs[bo_index];
+
+			bo_bucket->addr = (uint64_t)kgd_mem->va;
+			bo_bucket->size = amdgpu_bo_size(dumper_bo);
+			bo_bucket->gpu_id = pdd->dev->id;
+			bo_bucket->alloc_flags = (uint32_t)kgd_mem->alloc_flags;
+
+			bo_bucket->priv_data_offset = bo_index * sizeof(*bo_priv);
+			bo_bucket->priv_data_size = sizeof(*bo_priv);
+
+			bo_priv->idr_handle = id;
+			if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
+				ret = amdgpu_ttm_tt_get_userptr(&dumper_bo->tbo,
+								&bo_priv->user_addr);
+				if (ret) {
+					pr_err("Failed to obtain user address for user-pointer bo\n");
+					goto exit;
+				}
+			}
+			if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL)
+				bo_bucket->offset = KFD_MMAP_TYPE_DOORBELL |
+					KFD_MMAP_GPU_ID(pdd->dev->id);
+			else if (bo_bucket->alloc_flags &
+				KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)
+				bo_bucket->offset = KFD_MMAP_TYPE_MMIO |
+					KFD_MMAP_GPU_ID(pdd->dev->id);
+			else
+				bo_bucket->offset = amdgpu_bo_mmap_offset(dumper_bo);
+
+			pr_debug("bo_size = 0x%llx, bo_addr = 0x%llx bo_offset = 0x%llx\n"
+					"gpu_id = 0x%x alloc_flags = 0x%x idr_handle = 0x%x",
+					bo_bucket->size,
+					bo_bucket->addr,
+					bo_bucket->offset,
+					bo_bucket->gpu_id,
+					bo_bucket->alloc_flags,
+					bo_priv->idr_handle);
+			bo_index++;
+		}
+	}
+
+	ret = copy_to_user((void __user *)args->objects, bo_buckets, args->objects_size);
+	if (ret) {
+		pr_err("Failed to copy bo information to user\n");
+		ret = -EFAULT;
+	}
+
+exit:
+	kvfree(bo_buckets);
+	return ret;
+}
+
 static int kfd_ioctl_criu_dumper(struct file *filep,
 				struct kfd_process *p, void *data)
 {
-	pr_debug("Inside %s\n", __func__);
+	struct kfd_ioctl_criu_dumper_args *args = data;
+	int ret;
 
-	return 0;
+	pr_debug("CRIU dump type:%d\n", args->type);
+
+	if (!args->objects || !args->objects_size)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	if (!kfd_has_process_device_data(p)) {
+		pr_err("No pdd for given process\n");
+		ret = -ENODEV;
+		goto err_unlock;
+	}
+
+	switch (args->type) {
+	case KFD_CRIU_OBJECT_TYPE_PROCESS:
+		ret = criu_dump_process(p, args);
+		break;
+	case KFD_CRIU_OBJECT_TYPE_BO:
+		ret = criu_dump_bos(p, args);
+		break;
+	case KFD_CRIU_OBJECT_TYPE_QUEUE:
+	case KFD_CRIU_OBJECT_TYPE_EVENT:
+	case KFD_CRIU_OBJECT_TYPE_DEVICE:
+	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
+	default:
+		pr_err("Unsupported object type:%d\n", args->type);
+		ret = -EINVAL;
+	}
+
+err_unlock:
+	mutex_unlock(&p->mutex);
+	if (ret)
+		pr_err("Failed to dump CRIU type:%d ret:%d\n", args->type, ret);
+	else
+		pr_debug("CRIU dump type:%d ret:%d\n", args->type, ret);
+
+	return ret;
 }
 
 static int kfd_ioctl_criu_restorer(struct file *filep,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 4e390006b4b6..8c9f2b3ac85d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1031,7 +1031,8 @@ struct kfd_criu_device_priv_data {
 };
 
 struct kfd_criu_bo_priv_data {
-	uint64_t reserved;
+	uint64_t user_addr;
+	uint32_t idr_handle;
 };
 
 struct kfd_criu_svm_range_priv_data {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/18] drm/amdkfd: CRIU Implement KFD restore ioctl
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (4 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-19 13:37 ` [PATCH 07/18] drm/amdkfd: CRIU Implement KFD resume ioctl David Yat Sin
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

This implements the KFD CRIU Restore ioctl that lays the basic
foundation for the CRIU restore operation. It provides support to
create the buffer objects corresponding to Non-Paged system memory
mapped for GPU and/or CPU access and lays basic foundation for the
userptrs buffer objects which will be added in a separate patch.
This ioctl creates various types of buffer objects such as VRAM,
MMIO, Doorbell, GTT based on the date sent from the userspace plugin.
The data mostly contains the previously checkpointed KFD images from
some KFD processs.

While restoring a criu process, attach old IDR values to newly
created BOs. This also adds the minimal gpu mapping support for a single
gpu checkpoint restore use case.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
(cherry picked from commit 47bb685701c336d1fde7e91be93d9cabe89a4c1b)
(cherry picked from commit b71ba8158a7ddf9e4fd8d872be4e40ddd9a29b4f)
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 284 ++++++++++++++++++++++-
 1 file changed, 282 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d548e6691d69..2dab1845f9d3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2010,12 +2010,292 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
 	return ret;
 }
 
+static int criu_restore_process(struct kfd_process *p, struct kfd_ioctl_criu_restorer_args *args)
+{
+	int ret = 0;
+	uint8_t *objects;
+	struct kfd_criu_process_bucket *process_bucket;
+	struct kfd_criu_process_priv_data *process_priv;
+
+	if (args->num_objects != 1) {
+		pr_err("Only 1 process supported\n");
+		return -EINVAL;
+	}
+
+	if (args->objects_size != sizeof(*process_bucket) + sizeof(*process_priv)) {
+		pr_err("Invalid objects size for process\n");
+		return -EINVAL;
+	}
+
+	objects = kmalloc(args->objects_size, GFP_KERNEL);
+	if (!objects)
+		return -ENOMEM;
+
+	ret = copy_from_user(objects, (void __user *)args->objects, args->objects_size);
+	if (ret) {
+		pr_err("Failed to copy process information from user\n");
+		ret = -EFAULT;
+		goto exit;
+	}
+
+	process_bucket = (struct kfd_criu_process_bucket *)objects;
+	/* Private data starts after process bucket */
+	process_priv = (struct kfd_criu_process_priv_data *)
+			(objects + sizeof(*process_bucket) + process_bucket->priv_data_offset);
+
+	if (process_priv->version != KFD_CRIU_PRIV_VERSION) {
+		pr_err("Invalid CRIU API version (checkpointed:%d current:%d)\n",
+			process_priv->version, KFD_CRIU_PRIV_VERSION);
+		return -EINVAL;
+	}
+
+exit:
+	kfree(objects);
+	return ret;
+}
+
+static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restorer_args *args)
+{
+	uint8_t *objects, *private_data;
+	struct kfd_criu_bo_bucket *bo_buckets;
+	int ret = 0, i, j = 0;
+
+	if (args->objects_size != args->num_objects *
+		(sizeof(*bo_buckets) + sizeof(struct kfd_criu_bo_priv_data))) {
+		pr_err("Invalid objects size for BOs\n");
+		return -EINVAL;
+	}
+
+	objects = kmalloc(args->objects_size, GFP_KERNEL);
+	if (!objects)
+		return -ENOMEM;
+
+	ret = copy_from_user(objects, (void __user *)args->objects, args->objects_size);
+	if (ret) {
+		pr_err("Failed to copy BOs information from user\n");
+		ret = -EFAULT;
+		goto exit;
+	}
+
+	bo_buckets = (struct kfd_criu_bo_bucket *) objects;
+	/* Private data for first BO starts after all bo_buckets */
+	private_data = (void *)(bo_buckets + args->num_objects);
+
+	/* Create and map new BOs */
+	for (i = 0; i < args->num_objects; i++) {
+		struct kfd_criu_bo_bucket *bo_bucket;
+		struct kfd_criu_bo_priv_data *bo_priv;
+		struct kfd_dev *dev;
+		struct kfd_process_device *pdd;
+		void *mem;
+		u64 offset;
+		int idr_handle;
+
+		bo_bucket = &bo_buckets[i];
+		bo_priv = (struct kfd_criu_bo_priv_data *)
+				(private_data + bo_bucket->priv_data_offset);
+
+		dev = kfd_device_by_id(bo_bucket->gpu_id);
+		if (!dev) {
+			ret = -EINVAL;
+			pr_err("Failed to get pdd\n");
+			goto exit;
+		}
+		pdd = kfd_get_process_device_data(dev, p);
+		if (!pdd) {
+			ret = -EINVAL;
+			pr_err("Failed to get pdd\n");
+			goto exit;
+		}
+
+		pr_debug("kfd restore ioctl - bo_bucket[%d]:\n", i);
+		pr_debug("size = 0x%llx, bo_addr = 0x%llx bo_offset = 0x%llx\n"
+			"gpu_id = 0x%x alloc_flags = 0x%x\n"
+			"idr_handle = 0x%x\n",
+			bo_bucket->size,
+			bo_bucket->addr,
+			bo_bucket->offset,
+			bo_bucket->gpu_id,
+			bo_bucket->alloc_flags,
+			bo_priv->idr_handle);
+
+		if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL) {
+			pr_debug("restore ioctl: KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL\n");
+			if (bo_bucket->size != kfd_doorbell_process_slice(dev)) {
+				ret = -EINVAL;
+				goto exit;
+			}
+			offset = kfd_get_process_doorbells(pdd);
+		} else if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP) {
+			/* MMIO BOs need remapped bus address */
+			pr_debug("restore ioctl :KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP\n");
+			if (bo_bucket->size != PAGE_SIZE) {
+				pr_err("Invalid page size\n");
+				ret = -EINVAL;
+				goto exit;
+			}
+			offset = amdgpu_amdkfd_get_mmio_remap_phys_addr(dev->kgd);
+			if (!offset) {
+				pr_err("amdgpu_amdkfd_get_mmio_remap_phys_addr failed\n");
+				ret = -ENOMEM;
+				goto exit;
+			}
+		} else if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
+			offset = bo_priv->user_addr;
+		}
+
+		/* Create the BO */
+		ret = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(dev->kgd,
+						bo_bucket->addr,
+						bo_bucket->size,
+						pdd->drm_priv,
+						(struct kgd_mem **) &mem,
+						&offset,
+						bo_bucket->alloc_flags);
+		if (ret) {
+			pr_err("Could not create the BO\n");
+			ret = -ENOMEM;
+			goto exit;
+		}
+		pr_debug("New BO created: size = 0x%llx, bo_addr = 0x%llx bo_offset = 0x%llx\n",
+			bo_bucket->size, bo_bucket->addr, offset);
+
+		/* Restore previuos IDR handle */
+		pr_debug("Restoring old IDR handle for the BO");
+		idr_handle = idr_alloc(&pdd->alloc_idr, mem,
+				       bo_priv->idr_handle,
+				       bo_priv->idr_handle + 1, GFP_KERNEL);
+		if (idr_handle < 0) {
+			pr_err("Could not allocate idr\n");
+			amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd,
+						(struct kgd_mem *)mem,
+						pdd->drm_priv, NULL);
+
+			ret = -ENOMEM;
+			goto exit;
+		}
+
+		if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL)
+			bo_bucket->restored_offset = KFD_MMAP_TYPE_DOORBELL |
+				KFD_MMAP_GPU_ID(pdd->dev->id);
+		if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP) {
+			bo_bucket->restored_offset = KFD_MMAP_TYPE_MMIO |
+				KFD_MMAP_GPU_ID(pdd->dev->id);
+		} else if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_GTT) {
+			bo_bucket->restored_offset = offset;
+			pr_debug("updating offset for GTT\n");
+		} else if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+			bo_bucket->restored_offset = offset;
+			/* Update the VRAM usage count */
+			WRITE_ONCE(pdd->vram_usage, pdd->vram_usage + bo_bucket->size);
+			pr_debug("updating offset for VRAM\n");
+		}
+
+		/* now map these BOs to GPU/s */
+		for (j = 0; j < p->n_pdds; j++) {
+			struct kfd_process_device *pdd = p->pdds[j];
+			struct kfd_dev *peer;
+			struct kfd_process_device *peer_pdd;
+
+			peer = kfd_device_by_id(pdd->dev->id);
+
+			pr_debug("Inside mapping loop with desired gpu_id = 0x%x\n",
+							pdd->dev->id);
+			if (!peer) {
+				pr_debug("Getting device by id failed for 0x%x\n",
+						pdd->dev->id);
+				ret = -EINVAL;
+				goto exit;
+			}
+
+			peer_pdd = kfd_bind_process_to_device(peer, p);
+			if (IS_ERR(peer_pdd)) {
+				ret = PTR_ERR(peer_pdd);
+				goto exit;
+			}
+			pr_debug("map mem in restore ioctl -> 0x%llx\n",
+				 ((struct kgd_mem *)mem)->va);
+			ret = amdgpu_amdkfd_gpuvm_map_memory_to_gpu(peer->kgd,
+				(struct kgd_mem *)mem, peer_pdd->drm_priv);
+			if (ret) {
+				pr_err("Failed to map to gpu %d/%d\n",
+				j, p->n_pdds);
+				goto exit;
+			}
+		}
+
+		ret = amdgpu_amdkfd_gpuvm_sync_memory(dev->kgd,
+						      (struct kgd_mem *) mem, true);
+		if (ret) {
+			pr_debug("Sync memory failed, wait interrupted by user signal\n");
+			goto exit;
+		}
+
+		pr_debug("map memory was successful for the BO\n");
+	} /* done */
+
+	/* Flush TLBs after waiting for the page table updates to complete */
+	for (j = 0; j < p->n_pdds; j++) {
+		struct kfd_dev *peer;
+		struct kfd_process_device *pdd = p->pdds[j];
+		struct kfd_process_device *peer_pdd;
+
+		peer = kfd_device_by_id(pdd->dev->id);
+		if (WARN_ON_ONCE(!peer))
+			continue;
+		peer_pdd = kfd_get_process_device_data(peer, p);
+		if (WARN_ON_ONCE(!peer_pdd))
+			continue;
+		kfd_flush_tlb(peer_pdd);
+	}
+	/* Copy only the buckets back so user can read bo_buckets[N].restored_offset */
+	ret = copy_to_user((void __user *)args->objects,
+				bo_buckets,
+				(args->num_objects * sizeof(*bo_buckets)));
+	if (ret)
+		ret = -EFAULT;
+
+exit:
+	kvfree(objects);
+	return ret;
+}
+
 static int kfd_ioctl_criu_restorer(struct file *filep,
 				struct kfd_process *p, void *data)
 {
-	pr_debug("Inside %s\n", __func__);
+	struct kfd_ioctl_criu_restorer_args *args = data;
+	int ret;
 
-	return 0;
+	if (!args->objects || !args->objects_size)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	switch (args->type) {
+	case KFD_CRIU_OBJECT_TYPE_PROCESS:
+		ret = criu_restore_process(p, args);
+		break;
+	case KFD_CRIU_OBJECT_TYPE_BO:
+		ret = criu_restore_bos(p, args);
+		break;
+	case KFD_CRIU_OBJECT_TYPE_QUEUE:
+	case KFD_CRIU_OBJECT_TYPE_EVENT:
+	case KFD_CRIU_OBJECT_TYPE_DEVICE:
+	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
+	default:
+		pr_err("Unsupported object type:%d\n", args->type);
+		ret = -EINVAL;
+		goto exit;
+	}
+
+exit:
+	mutex_unlock(&p->mutex);
+	if (ret)
+		pr_err("Failed to restore CRIU type:%d ret:%d\n", args->type, ret);
+	else
+		pr_debug("CRIU restore type:%d ret:%d\n", args->type, ret);
+
+	return ret;
 }
 
 static int kfd_ioctl_criu_pause(struct file *filep, struct kfd_process *p, void *data)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/18] drm/amdkfd: CRIU Implement KFD resume ioctl
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (5 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 06/18] drm/amdkfd: CRIU Implement KFD restore ioctl David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-19 13:37 ` [PATCH 08/18] drm/amdkfd: CRIU Implement KFD pause ioctl David Yat Sin
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

This adds support to create userptr BOs on restore and introduces a new
ioctl to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the
restore process i.e. criu_resume ioctl is received, and the process is
ready to be resumed. This ioctl is different from other KFD CRIU ioctls
since its called by CRIU master restore process for all the target
processes being resumed by CRIU.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
(cherry picked from commit 1f0300f5a4dc12b3c1140b0f0953300b4a6ac81f)
(cherry picked from commit 5c5ae6026ea795ae39acff06db862a7ef2fc6aa9)
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h    |  5 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 51 +++++++++++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 40 +++++++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c      | 36 +++++++++++--
 5 files changed, 120 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 313ee49b9f17..158130a4f4cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -117,6 +117,7 @@ struct amdkfd_process_info {
 	atomic_t evicted_bos;
 	struct delayed_work restore_userptr_work;
 	struct pid *pid;
+	bool block_mmu_notifications;
 };
 
 int amdgpu_amdkfd_init(void);
@@ -249,7 +250,9 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *drm_priv);
 int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 		struct kgd_dev *kgd, uint64_t va, uint64_t size,
 		void *drm_priv, struct kgd_mem **mem,
-		uint64_t *offset, uint32_t flags);
+		uint64_t *offset, uint32_t flags, bool criu_resume);
+void amdgpu_amdkfd_block_mmu_notifications(void *p);
+int amdgpu_amdkfd_criu_resume(void *p);
 int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 		struct kgd_dev *kgd, struct kgd_mem *mem, void *drm_priv,
 		uint64_t *size);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index dfa025d694f8..ad8818844526 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -597,7 +597,8 @@ static void remove_kgd_mem_from_kfd_bo_list(struct kgd_mem *mem,
  *
  * Returns 0 for success, negative errno for errors.
  */
-static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr)
+static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr,
+			   bool criu_resume)
 {
 	struct amdkfd_process_info *process_info = mem->process_info;
 	struct amdgpu_bo *bo = mem->bo;
@@ -619,6 +620,17 @@ static int init_user_pages(struct kgd_mem *mem, uint64_t user_addr)
 		goto out;
 	}
 
+	if (criu_resume) {
+		/*
+		 * During a CRIU restore operation, the userptr buffer objects
+		 * will be validated in the restore_userptr_work worker at a
+		 * later stage when it is scheduled by another ioctl called by
+		 * CRIU master process for the target pid for restore.
+		 */
+		atomic_inc(&mem->invalid);
+		mutex_unlock(&process_info->lock);
+		return 0;
+	}
 	ret = amdgpu_ttm_tt_get_user_pages(bo, bo->tbo.ttm->pages);
 	if (ret) {
 		pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
@@ -982,6 +994,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
 		INIT_DELAYED_WORK(&info->restore_userptr_work,
 				  amdgpu_amdkfd_restore_userptr_worker);
 
+		info->block_mmu_notifications = false;
 		*process_info = info;
 		*ef = dma_fence_get(&info->eviction_fence->base);
 	}
@@ -1139,10 +1152,37 @@ uint64_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *drm_priv)
 	return avm->pd_phys_addr;
 }
 
+void amdgpu_amdkfd_block_mmu_notifications(void *p)
+{
+	struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+	pinfo->block_mmu_notifications = true;
+}
+
+int amdgpu_amdkfd_criu_resume(void *p)
+{
+	int ret = 0;
+	struct amdkfd_process_info *pinfo = (struct amdkfd_process_info *)p;
+
+	mutex_lock(&pinfo->lock);
+	pr_debug("scheduling work\n");
+	atomic_inc(&pinfo->evicted_bos);
+	if (!pinfo->block_mmu_notifications) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+	pinfo->block_mmu_notifications = false;
+	schedule_delayed_work(&pinfo->restore_userptr_work, 0);
+
+out_unlock:
+	mutex_unlock(&pinfo->lock);
+	return ret;
+}
+
 int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 		struct kgd_dev *kgd, uint64_t va, uint64_t size,
 		void *drm_priv, struct kgd_mem **mem,
-		uint64_t *offset, uint32_t flags)
+		uint64_t *offset, uint32_t flags, bool criu_resume)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	struct amdgpu_vm *avm = drm_priv_to_vm(drm_priv);
@@ -1247,7 +1287,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 	add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, user_addr);
 
 	if (user_addr) {
-		ret = init_user_pages(*mem, user_addr);
+		pr_debug("creating userptr BO for user_addr = %llu\n", user_addr);
+		ret = init_user_pages(*mem, user_addr, criu_resume);
 		if (ret)
 			goto allocate_init_user_pages_failed;
 	}
@@ -1742,6 +1783,10 @@ int amdgpu_amdkfd_evict_userptr(struct kgd_mem *mem,
 	int evicted_bos;
 	int r = 0;
 
+	/* Do not process MMU notifications until stage-4 IOCTL is received */
+	if (process_info->block_mmu_notifications)
+		return 0;
+
 	atomic_inc(&mem->invalid);
 	evicted_bos = atomic_inc_return(&process_info->evicted_bos);
 	if (evicted_bos == 1) {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 2dab1845f9d3..f0c278e7d7e0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1301,7 +1301,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 	err = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 		dev->kgd, args->va_addr, args->size,
 		pdd->drm_priv, (struct kgd_mem **) &mem, &offset,
-		flags);
+		flags, false);
 
 	if (err)
 		goto err_unlock;
@@ -2058,6 +2058,7 @@ static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restore
 {
 	uint8_t *objects, *private_data;
 	struct kfd_criu_bo_bucket *bo_buckets;
+	const bool criu_resume = true;
 	int ret = 0, i, j = 0;
 
 	if (args->objects_size != args->num_objects *
@@ -2066,6 +2067,9 @@ static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restore
 		return -EINVAL;
 	}
 
+	/* Prevent MMU notifications until stage-4 IOCTL (CRIU_RESUME) is received */
+	amdgpu_amdkfd_block_mmu_notifications(p->kgd_process_info);
+
 	objects = kmalloc(args->objects_size, GFP_KERNEL);
 	if (!objects)
 		return -ENOMEM;
@@ -2144,6 +2148,7 @@ static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restore
 			offset = bo_priv->user_addr;
 		}
 
+
 		/* Create the BO */
 		ret = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(dev->kgd,
 						bo_bucket->addr,
@@ -2151,7 +2156,8 @@ static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restore
 						pdd->drm_priv,
 						(struct kgd_mem **) &mem,
 						&offset,
-						bo_bucket->alloc_flags);
+						bo_bucket->alloc_flags,
+						criu_resume);
 		if (ret) {
 			pr_err("Could not create the BO\n");
 			ret = -ENOMEM;
@@ -2308,9 +2314,35 @@ static int kfd_ioctl_criu_pause(struct file *filep, struct kfd_process *p, void
 static int kfd_ioctl_criu_resume(struct file *filep,
 				struct kfd_process *p, void *data)
 {
-	pr_debug("Inside %s\n", __func__);
+	struct kfd_ioctl_criu_resume_args *args = data;
+	struct kfd_process *target = NULL;
+	struct pid *pid = NULL;
+	int ret = 0;
 
-	return 0;
+	pr_debug("Inside %s, target pid for criu restore: %d\n", __func__,
+		 args->pid);
+
+	pid = find_get_pid(args->pid);
+	if (!pid) {
+		pr_err("Cannot find pid info for %i\n", args->pid);
+		return -ESRCH;
+	}
+
+	pr_debug("calling kfd_lookup_process_by_pid\n");
+	target = kfd_lookup_process_by_pid(pid);
+	if (!target) {
+		pr_debug("Cannot find process info for %i\n", args->pid);
+		put_pid(pid);
+		return -ESRCH;
+	}
+
+	mutex_lock(&target->mutex);
+	ret =  amdgpu_amdkfd_criu_resume(target->kgd_process_info);
+	mutex_unlock(&target->mutex);
+
+	put_pid(pid);
+	kfd_unref_process(target);
+	return ret;
 }
 
 static int kfd_ioctl_criu_process_info(struct file *filep,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 8c9f2b3ac85d..719982605587 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -931,6 +931,7 @@ void *kfd_process_device_translate_handle(struct kfd_process_device *p,
 					int handle);
 void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
 					int handle);
+struct kfd_process *kfd_lookup_process_by_pid(struct pid *pid);
 
 bool kfd_has_process_device_data(struct kfd_process *p);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index bc133c3789d8..bbf21395fb06 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -65,7 +65,8 @@ static struct workqueue_struct *kfd_process_wq;
  */
 static struct workqueue_struct *kfd_restore_wq;
 
-static struct kfd_process *find_process(const struct task_struct *thread);
+static struct kfd_process *find_process(const struct task_struct *thread,
+					bool ref);
 static void kfd_process_ref_release(struct kref *ref);
 static struct kfd_process *create_process(const struct task_struct *thread);
 static int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep);
@@ -670,7 +671,8 @@ static int kfd_process_alloc_gpuvm(struct kfd_process_device *pdd,
 	int err;
 
 	err = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(kdev->kgd, gpu_va, size,
-						 pdd->drm_priv, &mem, NULL, flags);
+						 pdd->drm_priv, &mem, NULL, flags,
+						 false);
 	if (err)
 		goto err_alloc_mem;
 
@@ -773,7 +775,7 @@ struct kfd_process *kfd_create_process(struct file *filep)
 	mutex_lock(&kfd_processes_mutex);
 
 	/* A prior open of /dev/kfd could have already created the process. */
-	process = find_process(thread);
+	process = find_process(thread, false);
 	if (process) {
 		pr_debug("Process already found\n");
 	} else {
@@ -852,7 +854,7 @@ struct kfd_process *kfd_get_process(const struct task_struct *thread)
 	if (thread->group_leader->mm != thread->mm)
 		return ERR_PTR(-EINVAL);
 
-	process = find_process(thread);
+	process = find_process(thread, false);
 	if (!process)
 		return ERR_PTR(-EINVAL);
 
@@ -871,13 +873,16 @@ static struct kfd_process *find_process_by_mm(const struct mm_struct *mm)
 	return NULL;
 }
 
-static struct kfd_process *find_process(const struct task_struct *thread)
+static struct kfd_process *find_process(const struct task_struct *thread,
+					bool ref)
 {
 	struct kfd_process *p;
 	int idx;
 
 	idx = srcu_read_lock(&kfd_processes_srcu);
 	p = find_process_by_mm(thread->mm);
+	if (p && ref)
+		kref_get(&p->ref);
 	srcu_read_unlock(&kfd_processes_srcu, idx);
 
 	return p;
@@ -1578,6 +1583,27 @@ void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
 		idr_remove(&pdd->alloc_idr, handle);
 }
 
+/* This increments the process->ref counter. */
+struct kfd_process *kfd_lookup_process_by_pid(struct pid *pid)
+{
+	struct task_struct *task = NULL;
+	struct kfd_process *p    = NULL;
+
+	if (!pid) {
+		task = current;
+		get_task_struct(task);
+	} else {
+		task = get_pid_task(pid, PIDTYPE_PID);
+	}
+
+	if (task) {
+		p = find_process(task, true);
+		put_task_struct(task);
+	}
+
+	return p;
+}
+
 /* This increments the process->ref counter. */
 struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/18] drm/amdkfd: CRIU Implement KFD pause ioctl
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (6 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 07/18] drm/amdkfd: CRIU Implement KFD resume ioctl David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-19 13:37 ` [PATCH 09/18] drm/amdkfd: CRIU add queues support David Yat Sin
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

Introducing pause IOCTL. The CRIU amdgpu plugin is needs
to call AMDKFD_IOC_CRIU_PAUSE(pause = 1) before starting dump and
AMDKFD_IOC_CRIU_PAUSE(pause = 0) when dump is complete. This ensures
that the queues are not modified between each CRIU dump ioctl.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 23 +++++++++++++++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  3 +++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  1 +
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f0c278e7d7e0..24e5c53261f5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1984,6 +1984,14 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
 		goto err_unlock;
 	}
 
+	/* Confirm all process queues are evicted */
+	if (!p->queues_paused) {
+		pr_err("Cannot dump process when queues are not in evicted state\n");
+		/* CRIU plugin did not call AMDKFD_IOC_CRIU_PAUSE before dumping */
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+
 	switch (args->type) {
 	case KFD_CRIU_OBJECT_TYPE_PROCESS:
 		ret = criu_dump_process(p, args);
@@ -2306,9 +2314,20 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
 
 static int kfd_ioctl_criu_pause(struct file *filep, struct kfd_process *p, void *data)
 {
-	pr_debug("Inside %s\n", __func__);
+	int ret;
+	struct kfd_ioctl_criu_pause_args *args = data;
 
-	return 0;
+	if (args->pause)
+		ret = kfd_process_evict_queues(p);
+	else
+		ret = kfd_process_restore_queues(p);
+
+	if (ret)
+		pr_err("Failed to %s queues ret:%d\n", args->pause ? "evict" : "restore", ret);
+	else
+		p->queues_paused = !!(args->pause);
+
+	return ret;
 }
 
 static int kfd_ioctl_criu_resume(struct file *filep,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 719982605587..0b8165729cde 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -857,6 +857,9 @@ struct kfd_process {
 	bool svm_disabled;
 
 	bool xnack_enabled;
+
+	/* Queues are in paused stated because we are in the process of doing a CRIU checkpoint */
+	bool queues_paused;
 };
 
 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index bbf21395fb06..e4cb2f778590 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1268,6 +1268,7 @@ static struct kfd_process *create_process(const struct task_struct *thread)
 	process->lead_thread = thread->group_leader;
 	process->n_pdds = 0;
 	process->svm_disabled = false;
+	process->queues_paused = false;
 	INIT_DELAYED_WORK(&process->eviction_work, evict_process_worker);
 	INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
 	process->last_restore_timestamp = get_jiffies_64();
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/18] drm/amdkfd: CRIU add queues support
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (7 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 08/18] drm/amdkfd: CRIU Implement KFD pause ioctl David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-23 18:29   ` Felix Kuehling
  2021-08-19 13:37 ` [PATCH 10/18] drm/amdkfd: CRIU restore queue ids David Yat Sin
                   ` (8 subsequent siblings)
  17 siblings, 1 reply; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 380 ++++++++++++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  22 +-
 2 files changed, 400 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 24e5c53261f5..6f1c9fb8d46c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1965,6 +1965,213 @@ static int criu_dump_bos(struct kfd_process *p, struct kfd_ioctl_criu_dumper_arg
 	return ret;
 }
 
+static void get_queue_data_sizes(struct kfd_process_device *pdd,
+				struct queue *q,
+				uint32_t *cu_mask_size)
+{
+	*cu_mask_size = sizeof(uint32_t) * (q->properties.cu_mask_count / 32);
+}
+
+int get_process_queue_info(struct kfd_process *p, uint32_t *num_queues, uint32_t *q_data_sizes)
+{
+	u32 data_sizes = 0;
+	u32 q_index = 0;
+	struct queue *q;
+	int i;
+
+	/* Run over all PDDs of the process */
+	for (i = 0; i < p->n_pdds; i++) {
+		struct kfd_process_device *pdd = p->pdds[i];
+
+		list_for_each_entry(q, &pdd->qpd.queues_list, list) {
+			if (q->properties.type == KFD_QUEUE_TYPE_COMPUTE ||
+				q->properties.type == KFD_QUEUE_TYPE_SDMA ||
+				q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
+				u32 cu_mask_size;
+
+				get_queue_data_sizes(pdd, q, &cu_mask_size);
+
+				data_sizes += cu_mask_size;
+				q_index++;
+			} else {
+				pr_err("Unsupported queue type (%d)\n", q->properties.type);
+				return -EOPNOTSUPP;
+			}
+		}
+	}
+	*num_queues = q_index;
+	*q_data_sizes = data_sizes;
+
+	return 0;
+}
+
+static void criu_dump_queue(struct kfd_process_device *pdd,
+			   struct queue *q,
+			   struct kfd_criu_queue_bucket *q_bucket,
+			   void *private_data)
+{
+	struct kfd_criu_queue_priv_data *q_data = (struct kfd_criu_queue_priv_data *) private_data;
+	uint8_t *cu_mask;
+
+	cu_mask = (void *)(q_data + 1);
+
+	q_bucket->gpu_id = pdd->dev->id;
+	q_data->type = q->properties.type;
+	q_data->format = q->properties.format;
+	q_data->q_id =  q->properties.queue_id;
+	q_data->q_address = q->properties.queue_address;
+	q_data->q_size = q->properties.queue_size;
+	q_data->priority = q->properties.priority;
+	q_data->q_percent = q->properties.queue_percent;
+	q_data->read_ptr_addr = (uint64_t)q->properties.read_ptr;
+	q_data->write_ptr_addr = (uint64_t)q->properties.write_ptr;
+	q_data->doorbell_id = q->doorbell_id;
+
+	q_data->sdma_id = q->sdma_id;
+
+	q_data->eop_ring_buffer_address =
+		q->properties.eop_ring_buffer_address;
+
+	q_data->eop_ring_buffer_size = q->properties.eop_ring_buffer_size;
+
+	q_data->ctx_save_restore_area_address =
+		q->properties.ctx_save_restore_area_address;
+
+	q_data->ctx_save_restore_area_size =
+		q->properties.ctx_save_restore_area_size;
+
+	if (q_data->cu_mask_size)
+		memcpy(cu_mask, q->properties.cu_mask, q_data->cu_mask_size);
+
+	pr_debug("Dumping Queue: gpu_id:%x queue_id:%u\n", q_bucket->gpu_id, q_data->q_id);
+}
+
+static int criu_dump_queues_device(struct kfd_process_device *pdd,
+				unsigned int *q_index,
+				unsigned int max_num_queues,
+				struct kfd_criu_queue_bucket *q_buckets,
+				uint8_t *user_priv_data,
+				uint64_t *queues_priv_data_offset)
+{
+	struct queue *q;
+	uint8_t *q_private_data = NULL; /* Local buffer to store individual queue private data */
+	unsigned int q_private_data_size = 0;
+	int ret = 0;
+
+	list_for_each_entry(q, &pdd->qpd.queues_list, list) {
+		struct kfd_criu_queue_bucket q_bucket;
+		struct kfd_criu_queue_priv_data *q_data;
+		uint64_t q_data_size;
+		uint32_t cu_mask_size;
+
+		if (q->properties.type != KFD_QUEUE_TYPE_COMPUTE &&
+			q->properties.type != KFD_QUEUE_TYPE_SDMA &&
+			q->properties.type != KFD_QUEUE_TYPE_SDMA_XGMI) {
+
+			pr_err("Unsupported queue type (%d)\n", q->properties.type);
+			return -EOPNOTSUPP;
+		}
+
+		memset(&q_bucket, 0, sizeof(q_bucket));
+
+		get_queue_data_sizes(pdd, q, &cu_mask_size);
+
+		q_data_size = sizeof(*q_data) + cu_mask_size;
+
+		/* Increase local buffer space if needed */
+		if (q_private_data_size < q_data_size) {
+			kfree(q_private_data);
+
+			q_private_data = kzalloc(q_data_size, GFP_KERNEL);
+			if (!q_private_data) {
+				ret = -ENOMEM;
+				break;
+			}
+			q_private_data_size = q_data_size;
+		}
+
+		q_data = (struct kfd_criu_queue_priv_data *)q_private_data;
+
+		q_data->cu_mask_size = cu_mask_size;
+
+		criu_dump_queue(pdd, q, &q_bucket, q_data);
+
+		q_bucket.priv_data_offset = *queues_priv_data_offset;
+		q_bucket.priv_data_size = q_data_size;
+
+		ret = copy_to_user((void __user *) (user_priv_data + q_bucket.priv_data_offset),
+				q_private_data, q_bucket.priv_data_size);
+		if (ret) {
+			ret = -EFAULT;
+			break;
+		}
+		*queues_priv_data_offset += q_data_size;
+
+		ret = copy_to_user((void __user *)&q_buckets[*q_index],
+					&q_bucket, sizeof(q_bucket));
+		if (ret) {
+			pr_err("Failed to copy queue information to user\n");
+			ret = -EFAULT;
+			break;
+		}
+		*q_index = *q_index + 1;
+	}
+
+	kfree(q_private_data);
+
+	return ret;
+}
+
+static int criu_dump_queues(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
+{
+	struct kfd_criu_queue_bucket *queue_buckets;
+	uint32_t num_queues, queue_extra_data_sizes;
+	uint64_t queues_priv_data_offset = 0;
+	int ret = 0, pdd_index, q_index = 0;
+	void *private_data; /* Pointer to first private data in userspace */
+
+	ret = get_process_queue_info(p, &num_queues, &queue_extra_data_sizes);
+	if (ret)
+		return ret;
+
+	if (args->num_objects != num_queues) {
+		pr_err("Mismatch with number of queues (current:%d user:%lld)\n",
+							num_queues, args->num_objects);
+		return -EINVAL;
+	}
+
+	if (args->objects_size != queue_extra_data_sizes +
+				  (num_queues * (sizeof(*queue_buckets) +
+						 sizeof(struct kfd_criu_queue_priv_data)))) {
+		pr_err("Invalid objects size for queues\n");
+		return -EINVAL;
+	}
+
+	/* Queue private data size for each queue can vary in size as it also includes cu_mask, mqd
+	 * and ctl_stack. First queue private data starts after all queue_buckets
+	 */
+
+	queue_buckets = (struct kfd_criu_queue_bucket *)args->objects;
+	private_data = (void *)(queue_buckets + args->num_objects);
+
+	for (pdd_index = 0; pdd_index < p->n_pdds; pdd_index++) {
+		struct kfd_process_device *pdd = p->pdds[pdd_index];
+
+		/* criu_dump_queues_device will copy data to user */
+		ret = criu_dump_queues_device(pdd,
+					      &q_index,
+					      args->num_objects,
+					      queue_buckets,
+					      private_data,
+					      &queues_priv_data_offset);
+
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
 static int kfd_ioctl_criu_dumper(struct file *filep,
 				struct kfd_process *p, void *data)
 {
@@ -2000,6 +2207,8 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
 		ret = criu_dump_bos(p, args);
 		break;
 	case KFD_CRIU_OBJECT_TYPE_QUEUE:
+		ret = criu_dump_queues(p, args);
+		break;
 	case KFD_CRIU_OBJECT_TYPE_EVENT:
 	case KFD_CRIU_OBJECT_TYPE_DEVICE:
 	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
@@ -2274,6 +2483,163 @@ static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restore
 	return ret;
 }
 
+static int set_queue_properties_from_criu(struct queue_properties *qp,
+					  struct kfd_criu_queue_bucket *q_bucket,
+					  struct kfd_criu_queue_priv_data *q_data,
+					  void *cu_mask)
+{
+	qp->is_interop = false;
+	qp->is_gws = q_data->is_gws;
+	qp->queue_percent = q_data->q_percent;
+	qp->priority = q_data->priority;
+	qp->queue_address = q_data->q_address;
+	qp->queue_size = q_data->q_size;
+	qp->read_ptr = (uint32_t *) q_data->read_ptr_addr;
+	qp->write_ptr = (uint32_t *) q_data->write_ptr_addr;
+	qp->eop_ring_buffer_address = q_data->eop_ring_buffer_address;
+	qp->eop_ring_buffer_size = q_data->eop_ring_buffer_size;
+	qp->ctx_save_restore_area_address = q_data->ctx_save_restore_area_address;
+	qp->ctx_save_restore_area_size = q_data->ctx_save_restore_area_size;
+	qp->ctl_stack_size = q_data->ctl_stack_size;
+	qp->type = q_data->type;
+	qp->format = q_data->format;
+
+	if (q_data->cu_mask_size) {
+		qp->cu_mask = kzalloc(q_data->cu_mask_size, GFP_KERNEL);
+		if (!qp->cu_mask)
+			return -ENOMEM;
+
+		/* CU mask is stored after q_data */
+		memcpy(qp->cu_mask, cu_mask, q_data->cu_mask_size);
+		qp->cu_mask_count = (q_data->cu_mask_size / sizeof(uint32_t)) * 32;
+	}
+
+	return 0;
+}
+
+static int criu_restore_queue(struct kfd_process *p,
+			      struct kfd_dev *dev,
+			      struct kfd_process_device *pdd,
+			      struct kfd_criu_queue_bucket *q_bucket,
+			      void *private_data)
+{
+	struct kfd_criu_queue_priv_data *q_data = (struct kfd_criu_queue_priv_data *) private_data;
+	uint8_t *cu_mask, *mqd, *ctl_stack;
+	struct queue_properties qp;
+	unsigned int queue_id;
+	int ret = 0;
+
+	pr_debug("Restoring Queue: gpu_id:%x queue_id:%u\n", q_bucket->gpu_id, q_data->q_id);
+
+	/* data stored in this order: cu_mask, mqd, ctl_stack */
+	cu_mask = (void *)(q_data + 1);
+	mqd = cu_mask + q_data->cu_mask_size;
+	ctl_stack = mqd + q_data->mqd_size;
+
+	memset(&qp, 0, sizeof(qp));
+	ret = set_queue_properties_from_criu(&qp, q_bucket, q_data, cu_mask);
+	if (ret)
+		goto err_create_queue;
+
+	print_queue_properties(&qp);
+
+	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, NULL);
+	if (ret) {
+		pr_err("Failed to create new queue err:%d\n", ret);
+		ret = -EINVAL;
+		goto err_create_queue;
+	}
+
+	pr_debug("Queue id %d was restored successfully\n", queue_id);
+
+	return 0;
+err_create_queue:
+	kfree(qp.cu_mask);
+
+	return ret;
+}
+
+static int criu_restore_queues(struct kfd_process *p,
+			       struct kfd_ioctl_criu_restorer_args *args)
+{
+	int ret = 0, i;
+	struct kfd_criu_queue_bucket *user_buckets;
+	uint8_t *all_private_data; /* Pointer to first private data in userspace */
+	uint8_t *q_private_data = NULL; /* Local buffer for individual queue private data */
+	unsigned int q_private_data_size = 0;
+
+	user_buckets = (struct kfd_criu_queue_bucket *)args->objects;
+	all_private_data = (void *)(user_buckets + args->num_objects);
+
+	/*
+	 * This process will not have any queues at this point, but we are
+	 * setting all the dqm's for this process to evicted state.
+	 */
+	kfd_process_evict_queues(p);
+
+	for (i = 0; i < args->num_objects; i++) {
+		struct kfd_process_device *pdd;
+		struct kfd_dev *dev;
+		struct kfd_criu_queue_bucket q_bucket;
+
+		ret = copy_from_user(&q_bucket, (void __user *)&user_buckets[i],
+				sizeof(struct kfd_criu_queue_bucket));
+
+		if (ret) {
+			ret = -EFAULT;
+			goto exit;
+		}
+
+		/* Increase local buffer space if needed */
+		if (q_bucket.priv_data_size > q_private_data_size) {
+			kfree(q_private_data);
+
+			q_private_data = kmalloc(q_bucket.priv_data_size, GFP_KERNEL);
+			if (!q_private_data) {
+				ret = -ENOMEM;
+				goto exit;
+			}
+			q_private_data_size = q_bucket.priv_data_size;
+		}
+
+		ret = copy_from_user(q_private_data,
+				(void __user *) (all_private_data + q_bucket.priv_data_offset),
+				q_bucket.priv_data_size);
+		if (ret) {
+			ret = -EFAULT;
+			goto exit;
+		}
+
+		dev = kfd_device_by_id(q_bucket.gpu_id);
+		if (!dev) {
+			pr_err("Could not get kfd_dev from gpu_id = 0x%x\n",
+			q_bucket.gpu_id);
+
+			ret = -EINVAL;
+			goto exit;
+		}
+
+		pdd = kfd_get_process_device_data(dev, p);
+		if (!pdd) {
+			pr_err("Failed to get pdd\n");
+			ret = -EFAULT;
+			return ret;
+		}
+
+		ret = criu_restore_queue(p, dev, pdd, &q_bucket, q_private_data);
+		if (ret) {
+			pr_err("Failed to restore queue (%d)\n", ret);
+			goto exit;
+		}
+
+	}
+
+exit:
+	kfree(q_private_data);
+
+	return ret;
+}
+
 static int kfd_ioctl_criu_restorer(struct file *filep,
 				struct kfd_process *p, void *data)
 {
@@ -2293,6 +2659,8 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
 		ret = criu_restore_bos(p, args);
 		break;
 	case KFD_CRIU_OBJECT_TYPE_QUEUE:
+		ret = criu_restore_queues(p, args);
+		break;
 	case KFD_CRIU_OBJECT_TYPE_EVENT:
 	case KFD_CRIU_OBJECT_TYPE_DEVICE:
 	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
@@ -2368,6 +2736,7 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
 				struct kfd_process *p, void *data)
 {
 	struct kfd_ioctl_criu_process_info_args *args = data;
+	uint32_t queues_extra_data_size;
 	int ret = 0;
 
 	pr_debug("Inside %s\n", __func__);
@@ -2387,7 +2756,16 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
 	args->total_bos = get_process_num_bos(p);
 	args->bos_priv_data_size = args->total_bos * sizeof(struct kfd_criu_bo_priv_data);
 
-	dev_dbg(kfd_device, "Num of bos:%llu\n", args->total_bos);
+	ret = get_process_queue_info(p, &args->total_queues, &queues_extra_data_size);
+	if (ret)
+		goto err_unlock;
+
+	args->queues_priv_data_size = queues_extra_data_size +
+				(args->total_queues * sizeof(struct kfd_criu_queue_priv_data));
+
+	dev_dbg(kfd_device, "Num of bos:%llu queues:%u\n",
+				args->total_bos,
+				args->total_queues);
 err_unlock:
 	mutex_unlock(&p->mutex);
 	return ret;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 0b8165729cde..4b4808b191f2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1044,7 +1044,27 @@ struct kfd_criu_svm_range_priv_data {
 };
 
 struct kfd_criu_queue_priv_data {
-	uint64_t reserved;
+	uint64_t q_address;
+	uint64_t q_size;
+	uint64_t read_ptr_addr;
+	uint64_t write_ptr_addr;
+	uint64_t doorbell_off;
+	uint64_t eop_ring_buffer_address;
+	uint64_t ctx_save_restore_area_address;
+	uint32_t gpu_id;
+	uint32_t type;
+	uint32_t format;
+	uint32_t q_id;
+	uint32_t priority;
+	uint32_t q_percent;
+	uint32_t doorbell_id;
+	uint32_t is_gws;
+	uint32_t sdma_id;
+	uint32_t eop_ring_buffer_size;
+	uint32_t ctx_save_restore_area_size;
+	uint32_t ctl_stack_size;
+	uint32_t cu_mask_size;
+	uint32_t mqd_size;
 };
 
 struct kfd_criu_event_priv_data {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/18] drm/amdkfd: CRIU restore queue ids
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (8 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 09/18] drm/amdkfd: CRIU add queues support David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-23 18:29   ` Felix Kuehling
  2021-08-19 13:37 ` [PATCH 11/18] drm/amdkfd: CRIU restore sdma id for queues David Yat Sin
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump. Adding a new private
structure queue_restore_data to store queue restore information.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  4 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c       |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  2 ++
 .../amd/amdkfd/kfd_process_queue_manager.c    | 22 ++++++++++++++++++-
 4 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6f1c9fb8d46c..813ed42e3ce6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 			p->pasid,
 			dev->id);
 
-	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id,
+	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, NULL,
 			&doorbell_offset_in_process);
 	if (err != 0)
 		goto err_create_queue;
@@ -2543,7 +2543,7 @@ static int criu_restore_queue(struct kfd_process *p,
 
 	print_queue_properties(&qp);
 
-	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, NULL);
+	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, q_data, NULL);
 	if (ret) {
 		pr_err("Failed to create new queue err:%d\n", ret);
 		ret = -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 159add0f5aaa..749a7a3bf191 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
 	properties.type = KFD_QUEUE_TYPE_DIQ;
 
 	status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-				&properties, &qid, NULL);
+				&properties, &qid, NULL, NULL);
 
 	if (status) {
 		pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 4b4808b191f2..eaf5fe1480e9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -468,6 +468,7 @@ enum KFD_QUEUE_PRIORITY {
  * it's user mode or kernel mode queue.
  *
  */
+
 struct queue_properties {
 	enum kfd_queue_type type;
 	enum kfd_queue_format format;
@@ -1114,6 +1115,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			    struct file *f,
 			    struct queue_properties *properties,
 			    unsigned int *qid,
+			    const struct kfd_criu_queue_priv_data *q_data,
 			    uint32_t *p_doorbell_offset_in_process);
 int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
 int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 95a6c36cea4c..e6abab16b8de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -42,6 +42,20 @@ static inline struct process_queue_node *get_queue_by_qid(
 	return NULL;
 }
 
+static int assign_queue_slot_by_qid(struct process_queue_manager *pqm,
+				    unsigned int qid)
+{
+	if (qid >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
+		return -EINVAL;
+
+	if (__test_and_set_bit(qid, pqm->queue_slot_bitmap)) {
+		pr_err("Cannot create new queue because requested qid(%u) is in use\n", qid);
+		return -ENOSPC;
+	}
+
+	return 0;
+}
+
 static int find_available_queue_slot(struct process_queue_manager *pqm,
 					unsigned int *qid)
 {
@@ -193,6 +207,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			    struct file *f,
 			    struct queue_properties *properties,
 			    unsigned int *qid,
+			    const struct kfd_criu_queue_priv_data *q_data,
 			    uint32_t *p_doorbell_offset_in_process)
 {
 	int retval;
@@ -224,7 +239,12 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 	if (pdd->qpd.queue_count >= max_queues)
 		return -ENOSPC;
 
-	retval = find_available_queue_slot(pqm, qid);
+	if (q_data) {
+		retval = assign_queue_slot_by_qid(pqm, q_data->q_id);
+		*qid = q_data->q_id;
+	} else
+		retval = find_available_queue_slot(pqm, qid);
+
 	if (retval != 0)
 		return retval;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/18] drm/amdkfd: CRIU restore sdma id for queues
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (9 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 10/18] drm/amdkfd: CRIU restore queue ids David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-19 13:37 ` [PATCH 12/18] drm/amdkfd: CRIU restore queue doorbell id David Yat Sin
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++++++++++++++-----
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |  3 +-
 .../amd/amdkfd/kfd_process_queue_manager.c    |  4 +-
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 98c2046c7331..677f94e93218 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -58,7 +58,7 @@ static inline void deallocate_hqd(struct device_queue_manager *dqm,
 				struct queue *q);
 static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q);
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-				struct queue *q);
+				struct queue *q, const uint32_t *restore_sdma_id);
 static void kfd_process_hw_exception(struct work_struct *work);
 
 static inline
@@ -296,7 +296,8 @@ static void deallocate_vmid(struct device_queue_manager *dqm,
 
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
 				struct queue *q,
-				struct qcm_process_device *qpd)
+				struct qcm_process_device *qpd,
+				const struct kfd_criu_queue_priv_data *qd)
 {
 	struct mqd_manager *mqd_mgr;
 	int retval;
@@ -336,7 +337,7 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 			q->pipe, q->queue);
 	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
 		q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
-		retval = allocate_sdma_queue(dqm, q);
+		retval = allocate_sdma_queue(dqm, q, qd ? &qd->sdma_id : NULL);
 		if (retval)
 			goto deallocate_vmid;
 		dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
@@ -1022,7 +1023,7 @@ static void pre_reset(struct device_queue_manager *dqm)
 }
 
 static int allocate_sdma_queue(struct device_queue_manager *dqm,
-				struct queue *q)
+				struct queue *q, const uint32_t *restore_sdma_id)
 {
 	int bit;
 
@@ -1032,9 +1033,21 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
 			return -ENOMEM;
 		}
 
-		bit = __ffs64(dqm->sdma_bitmap);
-		dqm->sdma_bitmap &= ~(1ULL << bit);
-		q->sdma_id = bit;
+		if (restore_sdma_id) {
+			/* Re-use existing sdma_id */
+			if (!(dqm->sdma_bitmap & (1ULL << *restore_sdma_id))) {
+				pr_err("SDMA queue already in use\n");
+				return -EBUSY;
+			}
+			dqm->sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+			q->sdma_id = *restore_sdma_id;
+		} else {
+			/* Find first available sdma_id */
+			bit = __ffs64(dqm->sdma_bitmap);
+			dqm->sdma_bitmap &= ~(1ULL << bit);
+			q->sdma_id = bit;
+		}
+
 		q->properties.sdma_engine_id = q->sdma_id %
 				get_num_sdma_engines(dqm);
 		q->properties.sdma_queue_id = q->sdma_id /
@@ -1044,9 +1057,19 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
 			pr_err("No more XGMI SDMA queue to allocate\n");
 			return -ENOMEM;
 		}
-		bit = __ffs64(dqm->xgmi_sdma_bitmap);
-		dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
-		q->sdma_id = bit;
+		if (restore_sdma_id) {
+			/* Re-use existing sdma_id */
+			if (!(dqm->xgmi_sdma_bitmap & (1ULL << *restore_sdma_id))) {
+				pr_err("SDMA queue already in use\n");
+				return -EBUSY;
+			}
+			dqm->xgmi_sdma_bitmap &= ~(1ULL << *restore_sdma_id);
+			q->sdma_id = *restore_sdma_id;
+		} else {
+			bit = __ffs64(dqm->xgmi_sdma_bitmap);
+			dqm->xgmi_sdma_bitmap &= ~(1ULL << bit);
+			q->sdma_id = bit;
+		}
 		/* sdma_engine_id is sdma id including
 		 * both PCIe-optimized SDMAs and XGMI-
 		 * optimized SDMAs. The calculation below
@@ -1269,7 +1292,8 @@ static void destroy_kernel_queue_cpsch(struct device_queue_manager *dqm,
 }
 
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
-			struct qcm_process_device *qpd)
+			struct qcm_process_device *qpd,
+			const struct kfd_criu_queue_priv_data *qd)
 {
 	int retval;
 	struct mqd_manager *mqd_mgr;
@@ -1284,7 +1308,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 	if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
 		q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
 		dqm_lock(dqm);
-		retval = allocate_sdma_queue(dqm, q);
+		retval = allocate_sdma_queue(dqm, q, qd ? &qd->sdma_id : NULL);
 		dqm_unlock(dqm);
 		if (retval)
 			goto out;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 71e2fde56b2b..02cfa098ca1c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -86,7 +86,8 @@ struct device_process_node {
 struct device_queue_manager_ops {
 	int	(*create_queue)(struct device_queue_manager *dqm,
 				struct queue *q,
-				struct qcm_process_device *qpd);
+				struct qcm_process_device *qpd,
+				const struct kfd_criu_queue_priv_data *qd);
 
 	int	(*destroy_queue)(struct device_queue_manager *dqm,
 				struct qcm_process_device *qpd,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index e6abab16b8de..f30e128ee9c5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -272,7 +272,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			goto err_create_queue;
 		pqn->q = q;
 		pqn->kq = NULL;
-		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd);
+		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data);
 		print_queue(q);
 		break;
 
@@ -292,7 +292,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			goto err_create_queue;
 		pqn->q = q;
 		pqn->kq = NULL;
-		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd);
+		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data);
 		print_queue(q);
 		break;
 	case KFD_QUEUE_TYPE_DIQ:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/18] drm/amdkfd: CRIU restore queue doorbell id
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (10 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 11/18] drm/amdkfd: CRIU restore sdma id for queues David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-19 13:37 ` [PATCH 13/18] drm/amdkfd: CRIU dump and restore queue mqds David Yat Sin
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 61 +++++++++++++------
 1 file changed, 41 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 677f94e93218..5c268c7726d2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -153,7 +153,13 @@ static void decrement_queue_count(struct device_queue_manager *dqm,
 		dqm->active_cp_queue_count--;
 }
 
-static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
+/*
+ * Allocate a doorbell ID to this queue.
+ * If doorbell_id is passed in, make sure requested ID is valid then allocate it.
+ */
+static int allocate_doorbell(struct qcm_process_device *qpd,
+			     struct queue *q,
+			     uint32_t const *restore_id)
 {
 	struct kfd_dev *dev = qpd->dqm->dev;
 
@@ -161,6 +167,9 @@ static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
 		/* On pre-SOC15 chips we need to use the queue ID to
 		 * preserve the user mode ABI.
 		 */
+		if (restore_id && *restore_id != q->properties.queue_id)
+			return -EINVAL;
+
 		q->doorbell_id = q->properties.queue_id;
 	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
 			q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
@@ -169,25 +178,37 @@ static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
 		 * The doobell index distance between RLC (2*i) and (2*i+1)
 		 * for a SDMA engine is 512.
 		 */
-		uint32_t *idx_offset =
-				dev->shared_resources.sdma_doorbell_idx;
 
-		q->doorbell_id = idx_offset[q->properties.sdma_engine_id]
-			+ (q->properties.sdma_queue_id & 1)
-			* KFD_QUEUE_DOORBELL_MIRROR_OFFSET
-			+ (q->properties.sdma_queue_id >> 1);
+		uint32_t *idx_offset = dev->shared_resources.sdma_doorbell_idx;
+		uint32_t valid_id = idx_offset[q->properties.sdma_engine_id]
+						+ (q->properties.sdma_queue_id & 1)
+						* KFD_QUEUE_DOORBELL_MIRROR_OFFSET
+						+ (q->properties.sdma_queue_id >> 1);
+
+		if (restore_id && *restore_id != valid_id)
+			return -EINVAL;
+		q->doorbell_id = valid_id;
 	} else {
-		/* For CP queues on SOC15 reserve a free doorbell ID */
-		unsigned int found;
-
-		found = find_first_zero_bit(qpd->doorbell_bitmap,
-					    KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
-		if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
-			pr_debug("No doorbells available");
-			return -EBUSY;
+		/* For CP queues on SOC15 */
+		if (restore_id) {
+			/* make sure that ID is free  */
+			if (__test_and_set_bit(*restore_id, qpd->doorbell_bitmap))
+				return -EINVAL;
+
+			q->doorbell_id = *restore_id;
+		} else {
+			/* or reserve a free doorbell ID */
+			unsigned int found;
+
+			found = find_first_zero_bit(qpd->doorbell_bitmap,
+						KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+			if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
+				pr_debug("No doorbells available");
+				return -EBUSY;
+			}
+			set_bit(found, qpd->doorbell_bitmap);
+			q->doorbell_id = found;
 		}
-		set_bit(found, qpd->doorbell_bitmap);
-		q->doorbell_id = found;
 	}
 
 	q->properties.doorbell_off =
@@ -343,7 +364,7 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 		dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
 	}
 
-	retval = allocate_doorbell(qpd, q);
+	retval = allocate_doorbell(qpd, q, qd ? &qd->doorbell_id : NULL);
 	if (retval)
 		goto out_deallocate_hqd;
 
@@ -998,7 +1019,7 @@ static int start_nocpsch(struct device_queue_manager *dqm)
 {
 	pr_info("SW scheduler is used");
 	init_interrupts(dqm);
-	
+
 	if (dqm->dev->device_info->asic_family == CHIP_HAWAII)
 		return pm_init(&dqm->packets, dqm);
 	dqm->sched_running = true;
@@ -1314,7 +1335,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 			goto out;
 	}
 
-	retval = allocate_doorbell(qpd, q);
+	retval = allocate_doorbell(qpd, q, qd ? &qd->doorbell_id : NULL);
 	if (retval)
 		goto out_deallocate_sdma_queue;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/18] drm/amdkfd: CRIU dump and restore queue mqds
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (11 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 12/18] drm/amdkfd: CRIU restore queue doorbell id David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-19 13:37 ` [PATCH 14/18] drm/amdkfd: CRIU dump/restore queue control stack David Yat Sin
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

Dump contents of queue MQD's on CRIU dump and restore them during CRIU
restore.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 53 ++++++++++----
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c       |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 70 +++++++++++++++++--
 .../drm/amd/amdkfd/kfd_device_queue_manager.h | 11 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |  7 ++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 67 ++++++++++++++++++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 68 ++++++++++++++++++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 67 ++++++++++++++++++
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 68 ++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  7 ++
 .../amd/amdkfd/kfd_process_queue_manager.c    | 47 ++++++++++++-
 11 files changed, 444 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 813ed42e3ce6..68b06037616f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -313,7 +313,7 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 			dev->id);
 
 	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, NULL,
-			&doorbell_offset_in_process);
+			       NULL, &doorbell_offset_in_process);
 	if (err != 0)
 		goto err_create_queue;
 
@@ -1965,11 +1965,20 @@ static int criu_dump_bos(struct kfd_process *p, struct kfd_ioctl_criu_dumper_arg
 	return ret;
 }
 
-static void get_queue_data_sizes(struct kfd_process_device *pdd,
+static int get_queue_data_sizes(struct kfd_process_device *pdd,
 				struct queue *q,
-				uint32_t *cu_mask_size)
+				uint32_t *cu_mask_size,
+				uint32_t *mqd_size)
 {
+	int ret;
+
 	*cu_mask_size = sizeof(uint32_t) * (q->properties.cu_mask_count / 32);
+
+	ret = pqm_get_queue_dump_info(&pdd->process->pqm, q->properties.queue_id, mqd_size);
+	if (ret)
+		pr_err("Failed to get queue dump info (%d)\n", ret);
+
+	return ret;
 }
 
 int get_process_queue_info(struct kfd_process *p, uint32_t *num_queues, uint32_t *q_data_sizes)
@@ -1988,10 +1997,14 @@ int get_process_queue_info(struct kfd_process *p, uint32_t *num_queues, uint32_t
 				q->properties.type == KFD_QUEUE_TYPE_SDMA ||
 				q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
 				u32 cu_mask_size;
+				u32 mqd_size;
+				int ret;
 
-				get_queue_data_sizes(pdd, q, &cu_mask_size);
+				ret = get_queue_data_sizes(pdd, q, &cu_mask_size, &mqd_size);
+				if (ret)
+					return ret;
 
-				data_sizes += cu_mask_size;
+				data_sizes += cu_mask_size + mqd_size;
 				q_index++;
 			} else {
 				pr_err("Unsupported queue type (%d)\n", q->properties.type);
@@ -2005,15 +2018,17 @@ int get_process_queue_info(struct kfd_process *p, uint32_t *num_queues, uint32_t
 	return 0;
 }
 
-static void criu_dump_queue(struct kfd_process_device *pdd,
+static int criu_dump_queue(struct kfd_process_device *pdd,
 			   struct queue *q,
 			   struct kfd_criu_queue_bucket *q_bucket,
 			   void *private_data)
 {
 	struct kfd_criu_queue_priv_data *q_data = (struct kfd_criu_queue_priv_data *) private_data;
-	uint8_t *cu_mask;
+	uint8_t *cu_mask, *mqd;
+	int ret;
 
 	cu_mask = (void *)(q_data + 1);
+	mqd = cu_mask + q_data->cu_mask_size;
 
 	q_bucket->gpu_id = pdd->dev->id;
 	q_data->type = q->properties.type;
@@ -2043,7 +2058,14 @@ static void criu_dump_queue(struct kfd_process_device *pdd,
 	if (q_data->cu_mask_size)
 		memcpy(cu_mask, q->properties.cu_mask, q_data->cu_mask_size);
 
+	ret = pqm_dump_mqd(&pdd->process->pqm, q->properties.queue_id, mqd);
+	if (ret) {
+		pr_err("Failed dump queue_mqd (%d)\n", ret);
+		return ret;
+	}
+
 	pr_debug("Dumping Queue: gpu_id:%x queue_id:%u\n", q_bucket->gpu_id, q_data->q_id);
+	return ret;
 }
 
 static int criu_dump_queues_device(struct kfd_process_device *pdd,
@@ -2063,6 +2085,7 @@ static int criu_dump_queues_device(struct kfd_process_device *pdd,
 		struct kfd_criu_queue_priv_data *q_data;
 		uint64_t q_data_size;
 		uint32_t cu_mask_size;
+		uint32_t mqd_size;
 
 		if (q->properties.type != KFD_QUEUE_TYPE_COMPUTE &&
 			q->properties.type != KFD_QUEUE_TYPE_SDMA &&
@@ -2074,9 +2097,11 @@ static int criu_dump_queues_device(struct kfd_process_device *pdd,
 
 		memset(&q_bucket, 0, sizeof(q_bucket));
 
-		get_queue_data_sizes(pdd, q, &cu_mask_size);
+		ret = get_queue_data_sizes(pdd, q, &cu_mask_size, &mqd_size);
+		if (ret)
+			return ret;
 
-		q_data_size = sizeof(*q_data) + cu_mask_size;
+		q_data_size = sizeof(*q_data) + cu_mask_size + mqd_size;
 
 		/* Increase local buffer space if needed */
 		if (q_private_data_size < q_data_size) {
@@ -2093,8 +2118,11 @@ static int criu_dump_queues_device(struct kfd_process_device *pdd,
 		q_data = (struct kfd_criu_queue_priv_data *)q_private_data;
 
 		q_data->cu_mask_size = cu_mask_size;
+		q_data->mqd_size = mqd_size;
 
-		criu_dump_queue(pdd, q, &q_bucket, q_data);
+		ret = criu_dump_queue(pdd, q, &q_bucket, q_data);
+		if (ret)
+			break;
 
 		q_bucket.priv_data_offset = *queues_priv_data_offset;
 		q_bucket.priv_data_size = q_data_size;
@@ -2524,7 +2552,7 @@ static int criu_restore_queue(struct kfd_process *p,
 			      void *private_data)
 {
 	struct kfd_criu_queue_priv_data *q_data = (struct kfd_criu_queue_priv_data *) private_data;
-	uint8_t *cu_mask, *mqd, *ctl_stack;
+	uint8_t *cu_mask, *mqd;
 	struct queue_properties qp;
 	unsigned int queue_id;
 	int ret = 0;
@@ -2534,7 +2562,6 @@ static int criu_restore_queue(struct kfd_process *p,
 	/* data stored in this order: cu_mask, mqd, ctl_stack */
 	cu_mask = (void *)(q_data + 1);
 	mqd = cu_mask + q_data->cu_mask_size;
-	ctl_stack = mqd + q_data->mqd_size;
 
 	memset(&qp, 0, sizeof(qp));
 	ret = set_queue_properties_from_criu(&qp, q_bucket, q_data, cu_mask);
@@ -2543,7 +2570,7 @@ static int criu_restore_queue(struct kfd_process *p,
 
 	print_queue_properties(&qp);
 
-	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, q_data, NULL);
+	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, q_data, mqd, NULL);
 	if (ret) {
 		pr_err("Failed to create new queue err:%d\n", ret);
 		ret = -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index 749a7a3bf191..c6c0cd47e7f7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
 	properties.type = KFD_QUEUE_TYPE_DIQ;
 
 	status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-				&properties, &qid, NULL, NULL);
+				&properties, &qid, NULL, NULL, NULL);
 
 	if (status) {
 		pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 5c268c7726d2..14199e467e96 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -318,7 +318,8 @@ static void deallocate_vmid(struct device_queue_manager *dqm,
 static int create_queue_nocpsch(struct device_queue_manager *dqm,
 				struct queue *q,
 				struct qcm_process_device *qpd,
-				const struct kfd_criu_queue_priv_data *qd)
+				const struct kfd_criu_queue_priv_data *qd,
+				const void *restore_mqd)
 {
 	struct mqd_manager *mqd_mgr;
 	int retval;
@@ -377,8 +378,14 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 		retval = -ENOMEM;
 		goto out_deallocate_doorbell;
 	}
-	mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
-				&q->gart_mqd_addr, &q->properties);
+
+	if (qd)
+		mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
+				&q->gart_mqd_addr, &q->properties, restore_mqd);
+	else
+		mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
+					&q->gart_mqd_addr, &q->properties);
+
 	if (q->properties.is_active) {
 		if (!dqm->sched_running) {
 			WARN_ONCE(1, "Load non-HWS mqd while stopped\n");
@@ -1314,7 +1321,8 @@ static void destroy_kernel_queue_cpsch(struct device_queue_manager *dqm,
 
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 			struct qcm_process_device *qpd,
-			const struct kfd_criu_queue_priv_data *qd)
+			const struct kfd_criu_queue_priv_data *qd,
+			const void *restore_mqd)
 {
 	int retval;
 	struct mqd_manager *mqd_mgr;
@@ -1360,8 +1368,13 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 	 * updates the is_evicted flag but is a no-op otherwise.
 	 */
 	q->properties.is_evicted = !!qpd->evicted;
-	mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
-				&q->gart_mqd_addr, &q->properties);
+
+	if (qd)
+		mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
+				&q->gart_mqd_addr, &q->properties, restore_mqd);
+	else
+		mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
+					&q->gart_mqd_addr, &q->properties);
 
 	list_add(&q->list, &qpd->queues_list);
 	qpd->queue_count++;
@@ -1744,6 +1757,47 @@ static int get_wave_state(struct device_queue_manager *dqm,
 	return r;
 }
 
+static void get_queue_dump_info(struct device_queue_manager *dqm,
+			const struct queue *q,
+			u32 *mqd_size)
+{
+	struct mqd_manager *mqd_mgr;
+	enum KFD_MQD_TYPE mqd_type =
+			get_mqd_type_from_queue_type(q->properties.type);
+
+	mqd_mgr = dqm->mqd_mgrs[mqd_type];
+	*mqd_size = mqd_mgr->mqd_size;
+}
+
+static int dump_mqd(struct device_queue_manager *dqm,
+			  const struct queue *q,
+			  void *mqd)
+{
+	struct mqd_manager *mqd_mgr;
+	int r = 0;
+	enum KFD_MQD_TYPE mqd_type =
+			get_mqd_type_from_queue_type(q->properties.type);
+
+	dqm_lock(dqm);
+
+	if (q->properties.is_active || !q->device->cwsr_enabled) {
+		r = -EINVAL;
+		goto dqm_unlock;
+	}
+
+	mqd_mgr = dqm->mqd_mgrs[mqd_type];
+	if (!mqd_mgr->dump_mqd) {
+		r = -EOPNOTSUPP;
+		goto dqm_unlock;
+	}
+
+	mqd_mgr->dump_mqd(mqd_mgr, q->mqd, mqd);
+
+dqm_unlock:
+	dqm_unlock(dqm);
+	return r;
+}
+
 static int process_termination_cpsch(struct device_queue_manager *dqm,
 		struct qcm_process_device *qpd)
 {
@@ -1918,6 +1972,8 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.evict_process_queues = evict_process_queues_cpsch;
 		dqm->ops.restore_process_queues = restore_process_queues_cpsch;
 		dqm->ops.get_wave_state = get_wave_state;
+		dqm->ops.get_queue_dump_info = get_queue_dump_info;
+		dqm->ops.dump_mqd = dump_mqd;
 		break;
 	case KFD_SCHED_POLICY_NO_HWS:
 		/* initialize dqm for no cp scheduling */
@@ -1937,6 +1993,8 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.restore_process_queues =
 			restore_process_queues_nocpsch;
 		dqm->ops.get_wave_state = get_wave_state;
+		dqm->ops.get_queue_dump_info = get_queue_dump_info;
+		dqm->ops.dump_mqd = dump_mqd;
 		break;
 	default:
 		pr_err("Invalid scheduling policy %d\n", dqm->sched_policy);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 02cfa098ca1c..ae4170aece6d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -87,7 +87,8 @@ struct device_queue_manager_ops {
 	int	(*create_queue)(struct device_queue_manager *dqm,
 				struct queue *q,
 				struct qcm_process_device *qpd,
-				const struct kfd_criu_queue_priv_data *qd);
+				const struct kfd_criu_queue_priv_data *qd,
+				const void *restore_mqd);
 
 	int	(*destroy_queue)(struct device_queue_manager *dqm,
 				struct qcm_process_device *qpd,
@@ -135,6 +136,14 @@ struct device_queue_manager_ops {
 				  void __user *ctl_stack,
 				  u32 *ctl_stack_used_size,
 				  u32 *save_area_used_size);
+
+	void	(*get_queue_dump_info)(struct device_queue_manager *dqm,
+				  const struct queue *q,
+				  u32 *mqd_size);
+
+	int	(*dump_mqd)(struct device_queue_manager *dqm,
+				  const struct queue *q,
+				  void *mqd);
 };
 
 struct device_queue_manager_asic_ops {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
index b5e2ea7550d4..497e6f874352 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
@@ -98,6 +98,13 @@ struct mqd_manager {
 				  u32 *ctl_stack_used_size,
 				  u32 *save_area_used_size);
 
+	void	(*dump_mqd)(struct mqd_manager *mm, void *mqd, void *mqd_dst);
+
+	void	(*restore_mqd)(struct mqd_manager *mm, void **mqd,
+				struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+				struct queue_properties *p,
+				const void *mqd_src);
+
 #if defined(CONFIG_DEBUG_FS)
 	int	(*debugfs_show_mqd)(struct seq_file *m, void *data);
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index 064914e1e8d6..1d000252080c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -275,6 +275,69 @@ static int destroy_mqd(struct mqd_manager *mm, void *mqd,
 					pipe_id, queue_id);
 }
 
+static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+{
+	struct cik_mqd *m;
+
+	m = get_mqd(mqd);
+
+	memcpy(mqd_dst, m, sizeof(struct cik_mqd));
+}
+
+static void restore_mqd(struct mqd_manager *mm, void **mqd,
+			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+			struct queue_properties *qp,
+			const void *mqd_src)
+{
+	uint64_t addr;
+	struct cik_mqd *m;
+
+	m = (struct cik_mqd *) mqd_mem_obj->cpu_ptr;
+	addr = mqd_mem_obj->gpu_addr;
+
+	memcpy(m, mqd_src, sizeof(*m));
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+
+	m->cp_hqd_pq_doorbell_control = DOORBELL_OFFSET(qp->doorbell_off);
+
+	pr_debug("cp_hqd_pq_doorbell_control 0x%x\n",
+			m->cp_hqd_pq_doorbell_control);
+
+	qp->is_active = 0;
+}
+
+static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+{
+	struct cik_sdma_rlc_registers *m;
+
+	m = get_sdma_mqd(mqd);
+
+	memcpy(mqd_dst, m, sizeof(struct cik_sdma_rlc_registers));
+}
+
+static void restore_mqd_sdma(struct mqd_manager *mm, void **mqd,
+				struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+				struct queue_properties *qp,
+				const void *mqd_src)
+{
+	uint64_t addr;
+	struct cik_sdma_rlc_registers *m;
+
+	m = (struct cik_sdma_rlc_registers *) mqd_mem_obj->cpu_ptr;
+	addr = mqd_mem_obj->gpu_addr;
+
+	memcpy(m, mqd_src, sizeof(*m));
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+
+	qp->is_active = 0;
+}
+
 /*
  * preempt type here is ignored because there is only one way
  * to preempt sdma queue
@@ -388,6 +451,8 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd;
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
+		mqd->dump_mqd = dump_mqd;
+		mqd->restore_mqd = restore_mqd;
 		mqd->mqd_size = sizeof(struct cik_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
@@ -428,6 +493,8 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_sdma;
 		mqd->destroy_mqd = destroy_mqd_sdma;
 		mqd->is_occupied = is_occupied_sdma;
+		mqd->dump_mqd = dump_mqd_sdma;
+		mqd->restore_mqd = restore_mqd_sdma;
 		mqd->mqd_size = sizeof(struct cik_sdma_rlc_registers);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
index c7fb59ca597f..0066a2cf5672 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
@@ -283,6 +283,41 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
 	return 0;
 }
 
+static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+{
+	struct v10_compute_mqd *m;
+
+	m = get_mqd(mqd);
+
+	memcpy(mqd_dst, m, sizeof(struct v10_compute_mqd));
+}
+
+static void restore_mqd(struct mqd_manager *mm, void **mqd,
+			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+			struct queue_properties *qp,
+			const void *mqd_src)
+{
+	uint64_t addr;
+	struct v10_compute_mqd *m;
+
+	m = (struct v10_compute_mqd *) mqd_mem_obj->cpu_ptr;
+	addr = mqd_mem_obj->gpu_addr;
+
+	memcpy(m, mqd_src, sizeof(*m));
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+
+	m->cp_hqd_pq_doorbell_control =
+		qp->doorbell_off <<
+			CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
+	pr_debug("cp_hqd_pq_doorbell_control 0x%x\n",
+			m->cp_hqd_pq_doorbell_control);
+
+	qp->is_active = 0;
+}
+
 static void init_mqd_hiq(struct mqd_manager *mm, void **mqd,
 			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 			struct queue_properties *q)
@@ -370,6 +405,35 @@ static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
 	return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
 }
 
+static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+{
+	struct v10_sdma_mqd *m;
+
+	m = get_sdma_mqd(mqd);
+
+	memcpy(mqd_dst, m, sizeof(struct v10_sdma_mqd));
+}
+
+static void restore_mqd_sdma(struct mqd_manager *mm, void **mqd,
+			     struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+			     struct queue_properties *qp,
+			     const void *mqd_src)
+{
+	uint64_t addr;
+	struct v10_sdma_mqd *m;
+
+	m = (struct v10_sdma_mqd *) mqd_mem_obj->cpu_ptr;
+	addr = mqd_mem_obj->gpu_addr;
+
+	memcpy(m, mqd_src, sizeof(*m));
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+
+	qp->is_active = 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static int debugfs_show_mqd(struct seq_file *m, void *data)
@@ -414,6 +478,8 @@ struct mqd_manager *mqd_manager_init_v10(enum KFD_MQD_TYPE type,
 		mqd->is_occupied = is_occupied;
 		mqd->mqd_size = sizeof(struct v10_compute_mqd);
 		mqd->get_wave_state = get_wave_state;
+		mqd->dump_mqd = dump_mqd;
+		mqd->restore_mqd = restore_mqd;
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
 #endif
@@ -458,6 +524,8 @@ struct mqd_manager *mqd_manager_init_v10(enum KFD_MQD_TYPE type,
 		mqd->destroy_mqd = destroy_mqd_sdma;
 		mqd->is_occupied = is_occupied_sdma;
 		mqd->mqd_size = sizeof(struct v10_sdma_mqd);
+		mqd->dump_mqd = dump_mqd_sdma;
+		mqd->restore_mqd = restore_mqd_sdma;
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
 #endif
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 7f4e102ff4bd..5b6beb69dfc2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -338,6 +338,41 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
 	return 0;
 }
 
+static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+{
+	struct v9_mqd *m;
+
+	m = get_mqd(mqd);
+
+	memcpy(mqd_dst, m, sizeof(struct v9_mqd));
+}
+
+static void restore_mqd(struct mqd_manager *mm, void **mqd,
+			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+			struct queue_properties *qp, const void *mqd_src)
+{
+	uint64_t addr;
+	struct v9_mqd *m;
+
+	m = (struct v9_mqd *) mqd_mem_obj->cpu_ptr;
+	addr = mqd_mem_obj->gpu_addr;
+
+	memcpy(m, mqd_src, sizeof(*m));
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+
+	/* Control stack is located one page after MQD. */
+	m->cp_hqd_pq_doorbell_control =
+		qp->doorbell_off <<
+			CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
+	pr_debug("cp_hqd_pq_doorbell_control 0x%x\n",
+				m->cp_hqd_pq_doorbell_control);
+
+	qp->is_active = 0;
+}
+
 static void init_mqd_hiq(struct mqd_manager *mm, void **mqd,
 			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 			struct queue_properties *q)
@@ -425,6 +460,34 @@ static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
 	return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
 }
 
+static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+{
+	struct v9_sdma_mqd *m;
+
+	m = get_sdma_mqd(mqd);
+
+	memcpy(mqd_dst, m, sizeof(struct v9_sdma_mqd));
+}
+
+static void restore_mqd_sdma(struct mqd_manager *mm, void **mqd,
+			     struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+			     struct queue_properties *qp, const void *mqd_src)
+{
+	uint64_t addr;
+	struct v9_sdma_mqd *m;
+
+	m = (struct v9_sdma_mqd *) mqd_mem_obj->cpu_ptr;
+	addr = mqd_mem_obj->gpu_addr;
+
+	memcpy(m, mqd_src, sizeof(*m));
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+
+	qp->is_active = 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static int debugfs_show_mqd(struct seq_file *m, void *data)
@@ -467,6 +530,8 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
 		mqd->get_wave_state = get_wave_state;
+		mqd->dump_mqd = dump_mqd;
+		mqd->restore_mqd = restore_mqd;
 		mqd->mqd_size = sizeof(struct v9_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
@@ -507,6 +572,8 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_sdma;
 		mqd->destroy_mqd = destroy_mqd_sdma;
 		mqd->is_occupied = is_occupied_sdma;
+		mqd->dump_mqd = dump_mqd_sdma;
+		mqd->restore_mqd = restore_mqd_sdma;
 		mqd->mqd_size = sizeof(struct v9_sdma_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index 33dbd22d290f..ae5e3edec92e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -303,6 +303,41 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
 	return 0;
 }
 
+static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+{
+	struct vi_mqd *m;
+
+	m = get_mqd(mqd);
+
+	memcpy(mqd_dst, m, sizeof(struct vi_mqd));
+}
+
+static void restore_mqd(struct mqd_manager *mm, void **mqd,
+			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+			struct queue_properties *qp,
+			const void *mqd_src)
+{
+	uint64_t addr;
+	struct vi_mqd *m;
+
+	m = (struct vi_mqd *) mqd_mem_obj->cpu_ptr;
+	addr = mqd_mem_obj->gpu_addr;
+
+	memcpy(m, mqd_src, sizeof(*m));
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+
+	m->cp_hqd_pq_doorbell_control =
+		qp->doorbell_off <<
+			CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
+	pr_debug("cp_hqd_pq_doorbell_control 0x%x\n",
+			m->cp_hqd_pq_doorbell_control);
+
+	qp->is_active = 0;
+}
+
 static void init_mqd_hiq(struct mqd_manager *mm, void **mqd,
 			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 			struct queue_properties *q)
@@ -394,6 +429,35 @@ static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
 	return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
 }
 
+static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+{
+	struct vi_sdma_mqd *m;
+
+	m = get_sdma_mqd(mqd);
+
+	memcpy(mqd_dst, m, sizeof(struct vi_sdma_mqd));
+}
+
+static void restore_mqd_sdma(struct mqd_manager *mm, void **mqd,
+			     struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
+			     struct queue_properties *qp,
+			     const void *mqd_src)
+{
+	uint64_t addr;
+	struct vi_sdma_mqd *m;
+
+	m = (struct vi_sdma_mqd *) mqd_mem_obj->cpu_ptr;
+	addr = mqd_mem_obj->gpu_addr;
+
+	memcpy(m, mqd_src, sizeof(*m));
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+
+	qp->is_active = 0;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static int debugfs_show_mqd(struct seq_file *m, void *data)
@@ -436,6 +500,8 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
 		mqd->get_wave_state = get_wave_state;
+		mqd->dump_mqd = dump_mqd;
+		mqd->restore_mqd = restore_mqd;
 		mqd->mqd_size = sizeof(struct vi_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd;
@@ -476,6 +542,8 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd_sdma;
 		mqd->destroy_mqd = destroy_mqd_sdma;
 		mqd->is_occupied = is_occupied_sdma;
+		mqd->dump_mqd = dump_mqd_sdma;
+		mqd->restore_mqd = restore_mqd_sdma;
 		mqd->mqd_size = sizeof(struct vi_sdma_mqd);
 #if defined(CONFIG_DEBUG_FS)
 		mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index eaf5fe1480e9..5d9efcc63208 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1116,6 +1116,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			    struct queue_properties *properties,
 			    unsigned int *qid,
 			    const struct kfd_criu_queue_priv_data *q_data,
+			    const void *restore_mqd,
 			    uint32_t *p_doorbell_offset_in_process);
 int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
 int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid,
@@ -1137,6 +1138,12 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
 			      uint64_t fence_value,
 			      unsigned int timeout_ms);
+int pqm_get_queue_dump_info(struct process_queue_manager *pqm,
+			unsigned int qid,
+			u32 *mqd_size);
+int pqm_dump_mqd(struct process_queue_manager *pqm,
+		       unsigned int qid,
+		       void *dst_mqd);
 
 /* Packet Manager */
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index f30e128ee9c5..a7eb1c6a700f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -208,6 +208,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			    struct queue_properties *properties,
 			    unsigned int *qid,
 			    const struct kfd_criu_queue_priv_data *q_data,
+			    const void *restore_mqd,
 			    uint32_t *p_doorbell_offset_in_process)
 {
 	int retval;
@@ -272,7 +273,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			goto err_create_queue;
 		pqn->q = q;
 		pqn->kq = NULL;
-		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data);
+		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data, restore_mqd);
 		print_queue(q);
 		break;
 
@@ -292,7 +293,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			goto err_create_queue;
 		pqn->q = q;
 		pqn->kq = NULL;
-		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data);
+		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data, restore_mqd);
 		print_queue(q);
 		break;
 	case KFD_QUEUE_TYPE_DIQ:
@@ -527,6 +528,48 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 						       save_area_used_size);
 }
 
+int pqm_get_queue_dump_info(struct process_queue_manager *pqm,
+			unsigned int qid,
+			u32 *mqd_size)
+{
+	struct process_queue_node *pqn;
+
+	pqn = get_queue_by_qid(pqm, qid);
+	if (!pqn) {
+		pr_debug("amdkfd: No queue %d exists for operation\n", qid);
+		return -EFAULT;
+	}
+
+	if (!pqn->q->device->dqm->ops.get_queue_dump_info) {
+		pr_err("amdkfd: queue dumping not supported on this device\n");
+		return -EOPNOTSUPP;
+	}
+
+	pqn->q->device->dqm->ops.get_queue_dump_info(pqn->q->device->dqm,
+						       pqn->q, mqd_size);
+	return 0;
+}
+
+int pqm_dump_mqd(struct process_queue_manager *pqm,
+		       unsigned int qid, void *mqd)
+{
+	struct process_queue_node *pqn;
+
+	pqn = get_queue_by_qid(pqm, qid);
+	if (!pqn) {
+		pr_debug("amdkfd: No queue %d exists for operation\n", qid);
+		return -EFAULT;
+	}
+
+	if (!pqn->q->device->dqm->ops.dump_mqd) {
+		pr_err("amdkfd: queue dumping not supported on this device\n");
+		return -EOPNOTSUPP;
+	}
+
+	return pqn->q->device->dqm->ops.dump_mqd(pqn->q->device->dqm,
+						       pqn->q, mqd);
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 int pqm_debugfs_mqds(struct seq_file *m, void *data)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/18] drm/amdkfd: CRIU dump/restore queue control stack
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (12 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 13/18] drm/amdkfd: CRIU dump and restore queue mqds David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-19 13:37 ` [PATCH 15/18] drm/amdkfd: CRIU dump and restore events David Yat Sin
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

Dump contents of queue control stacks on CRIU dump and restore them
during CRIU restore.

(rajneesh: rebased to 5.11 and fixed merge conflict)
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      | 31 ++++++++++++-------
 drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c       |  2 +-
 .../drm/amd/amdkfd/kfd_device_queue_manager.c | 30 ++++++++++++------
 .../drm/amd/amdkfd/kfd_device_queue_manager.h | 10 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h  |  8 +++--
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c  | 17 +++++++---
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c  | 18 ++++++++---
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c   | 27 +++++++++++++---
 .../gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c   | 17 +++++++---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  8 +++--
 .../amd/amdkfd/kfd_process_queue_manager.c    | 19 +++++++-----
 11 files changed, 133 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 68b06037616f..19f16e3dd769 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -313,7 +313,7 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 			dev->id);
 
 	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, NULL,
-			       NULL, &doorbell_offset_in_process);
+			NULL, NULL, &doorbell_offset_in_process);
 	if (err != 0)
 		goto err_create_queue;
 
@@ -1968,13 +1968,15 @@ static int criu_dump_bos(struct kfd_process *p, struct kfd_ioctl_criu_dumper_arg
 static int get_queue_data_sizes(struct kfd_process_device *pdd,
 				struct queue *q,
 				uint32_t *cu_mask_size,
-				uint32_t *mqd_size)
+				uint32_t *mqd_size,
+				uint32_t *ctl_stack_size)
 {
 	int ret;
 
 	*cu_mask_size = sizeof(uint32_t) * (q->properties.cu_mask_count / 32);
 
-	ret = pqm_get_queue_dump_info(&pdd->process->pqm, q->properties.queue_id, mqd_size);
+	ret = pqm_get_queue_dump_info(&pdd->process->pqm, q->properties.queue_id, mqd_size,
+				      ctl_stack_size);
 	if (ret)
 		pr_err("Failed to get queue dump info (%d)\n", ret);
 
@@ -1998,13 +2000,15 @@ int get_process_queue_info(struct kfd_process *p, uint32_t *num_queues, uint32_t
 				q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
 				u32 cu_mask_size;
 				u32 mqd_size;
+				u32 ctl_stack_size;
 				int ret;
 
-				ret = get_queue_data_sizes(pdd, q, &cu_mask_size, &mqd_size);
+				ret = get_queue_data_sizes(pdd, q, &cu_mask_size, &mqd_size,
+							   &ctl_stack_size);
 				if (ret)
 					return ret;
 
-				data_sizes += cu_mask_size + mqd_size;
+				data_sizes += cu_mask_size + mqd_size + ctl_stack_size;
 				q_index++;
 			} else {
 				pr_err("Unsupported queue type (%d)\n", q->properties.type);
@@ -2024,11 +2028,12 @@ static int criu_dump_queue(struct kfd_process_device *pdd,
 			   void *private_data)
 {
 	struct kfd_criu_queue_priv_data *q_data = (struct kfd_criu_queue_priv_data *) private_data;
-	uint8_t *cu_mask, *mqd;
+	uint8_t *cu_mask, *mqd, *ctl_stack;
 	int ret;
 
 	cu_mask = (void *)(q_data + 1);
 	mqd = cu_mask + q_data->cu_mask_size;
+	ctl_stack = mqd + q_data->mqd_size;
 
 	q_bucket->gpu_id = pdd->dev->id;
 	q_data->type = q->properties.type;
@@ -2058,7 +2063,7 @@ static int criu_dump_queue(struct kfd_process_device *pdd,
 	if (q_data->cu_mask_size)
 		memcpy(cu_mask, q->properties.cu_mask, q_data->cu_mask_size);
 
-	ret = pqm_dump_mqd(&pdd->process->pqm, q->properties.queue_id, mqd);
+	ret = pqm_dump_mqd(&pdd->process->pqm, q->properties.queue_id, mqd, ctl_stack);
 	if (ret) {
 		pr_err("Failed dump queue_mqd (%d)\n", ret);
 		return ret;
@@ -2086,6 +2091,7 @@ static int criu_dump_queues_device(struct kfd_process_device *pdd,
 		uint64_t q_data_size;
 		uint32_t cu_mask_size;
 		uint32_t mqd_size;
+		uint32_t ctl_stack_size;
 
 		if (q->properties.type != KFD_QUEUE_TYPE_COMPUTE &&
 			q->properties.type != KFD_QUEUE_TYPE_SDMA &&
@@ -2097,11 +2103,11 @@ static int criu_dump_queues_device(struct kfd_process_device *pdd,
 
 		memset(&q_bucket, 0, sizeof(q_bucket));
 
-		ret = get_queue_data_sizes(pdd, q, &cu_mask_size, &mqd_size);
+		ret = get_queue_data_sizes(pdd, q, &cu_mask_size, &mqd_size, &ctl_stack_size);
 		if (ret)
 			return ret;
 
-		q_data_size = sizeof(*q_data) + cu_mask_size + mqd_size;
+		q_data_size = sizeof(*q_data) + cu_mask_size + mqd_size + ctl_stack_size;
 
 		/* Increase local buffer space if needed */
 		if (q_private_data_size < q_data_size) {
@@ -2117,8 +2123,10 @@ static int criu_dump_queues_device(struct kfd_process_device *pdd,
 
 		q_data = (struct kfd_criu_queue_priv_data *)q_private_data;
 
+		/* data stored in this order: priv_data, cu_mask, mqd, ctl_stack */
 		q_data->cu_mask_size = cu_mask_size;
 		q_data->mqd_size = mqd_size;
+		q_data->ctl_stack_size = ctl_stack_size;
 
 		ret = criu_dump_queue(pdd, q, &q_bucket, q_data);
 		if (ret)
@@ -2552,7 +2560,7 @@ static int criu_restore_queue(struct kfd_process *p,
 			      void *private_data)
 {
 	struct kfd_criu_queue_priv_data *q_data = (struct kfd_criu_queue_priv_data *) private_data;
-	uint8_t *cu_mask, *mqd;
+	uint8_t *cu_mask, *mqd, *ctl_stack;
 	struct queue_properties qp;
 	unsigned int queue_id;
 	int ret = 0;
@@ -2562,6 +2570,7 @@ static int criu_restore_queue(struct kfd_process *p,
 	/* data stored in this order: cu_mask, mqd, ctl_stack */
 	cu_mask = (void *)(q_data + 1);
 	mqd = cu_mask + q_data->cu_mask_size;
+	ctl_stack = mqd + q_data->mqd_size;
 
 	memset(&qp, 0, sizeof(qp));
 	ret = set_queue_properties_from_criu(&qp, q_bucket, q_data, cu_mask);
@@ -2570,7 +2579,7 @@ static int criu_restore_queue(struct kfd_process *p,
 
 	print_queue_properties(&qp);
 
-	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, q_data, mqd, NULL);
+	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, q_data, mqd, ctl_stack, NULL);
 	if (ret) {
 		pr_err("Failed to create new queue err:%d\n", ret);
 		ret = -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
index c6c0cd47e7f7..3c29e60b967f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
@@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
 	properties.type = KFD_QUEUE_TYPE_DIQ;
 
 	status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
-				&properties, &qid, NULL, NULL, NULL);
+				&properties, &qid, NULL, NULL, NULL, NULL);
 
 	if (status) {
 		pr_err("Failed to create DIQ\n");
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 14199e467e96..5943dcf1720f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -319,7 +319,8 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 				struct queue *q,
 				struct qcm_process_device *qpd,
 				const struct kfd_criu_queue_priv_data *qd,
-				const void *restore_mqd)
+				const void *restore_mqd,
+				const void *restore_ctl_stack)
 {
 	struct mqd_manager *mqd_mgr;
 	int retval;
@@ -380,8 +381,9 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 	}
 
 	if (qd)
-		mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
-				&q->gart_mqd_addr, &q->properties, restore_mqd);
+		mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj, &q->gart_mqd_addr,
+				     &q->properties, restore_mqd,
+				     restore_ctl_stack, qd->ctl_stack_size);
 	else
 		mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
 					&q->gart_mqd_addr, &q->properties);
@@ -1322,7 +1324,7 @@ static void destroy_kernel_queue_cpsch(struct device_queue_manager *dqm,
 static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 			struct qcm_process_device *qpd,
 			const struct kfd_criu_queue_priv_data *qd,
-			const void *restore_mqd)
+			const void *restore_mqd, const void *restore_ctl_stack)
 {
 	int retval;
 	struct mqd_manager *mqd_mgr;
@@ -1370,8 +1372,9 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 	q->properties.is_evicted = !!qpd->evicted;
 
 	if (qd)
-		mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
-				&q->gart_mqd_addr, &q->properties, restore_mqd);
+		mqd_mgr->restore_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj, &q->gart_mqd_addr,
+				     &q->properties, restore_mqd, restore_ctl_stack,
+				     qd->ctl_stack_size);
 	else
 		mqd_mgr->init_mqd(mqd_mgr, &q->mqd, q->mqd_mem_obj,
 					&q->gart_mqd_addr, &q->properties);
@@ -1759,19 +1762,28 @@ static int get_wave_state(struct device_queue_manager *dqm,
 
 static void get_queue_dump_info(struct device_queue_manager *dqm,
 			const struct queue *q,
-			u32 *mqd_size)
+			u32 *mqd_size,
+			u32 *ctl_stack_size)
 {
 	struct mqd_manager *mqd_mgr;
 	enum KFD_MQD_TYPE mqd_type =
 			get_mqd_type_from_queue_type(q->properties.type);
 
+	dqm_lock(dqm);
 	mqd_mgr = dqm->mqd_mgrs[mqd_type];
 	*mqd_size = mqd_mgr->mqd_size;
+	*ctl_stack_size = 0;
+
+	if (q->properties.type == KFD_QUEUE_TYPE_COMPUTE && mqd_mgr->get_dump_info)
+		mqd_mgr->get_dump_info(mqd_mgr, q->mqd, ctl_stack_size);
+
+	dqm_unlock(dqm);
 }
 
 static int dump_mqd(struct device_queue_manager *dqm,
 			  const struct queue *q,
-			  void *mqd)
+			  void *mqd,
+			  void *ctl_stack)
 {
 	struct mqd_manager *mqd_mgr;
 	int r = 0;
@@ -1791,7 +1803,7 @@ static int dump_mqd(struct device_queue_manager *dqm,
 		goto dqm_unlock;
 	}
 
-	mqd_mgr->dump_mqd(mqd_mgr, q->mqd, mqd);
+	mqd_mgr->dump_mqd(mqd_mgr, q->mqd, mqd, ctl_stack);
 
 dqm_unlock:
 	dqm_unlock(dqm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index ae4170aece6d..9d7d1308df71 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -88,7 +88,8 @@ struct device_queue_manager_ops {
 				struct queue *q,
 				struct qcm_process_device *qpd,
 				const struct kfd_criu_queue_priv_data *qd,
-				const void *restore_mqd);
+				const void *restore_mqd,
+				const void *restore_ctl_stack);
 
 	int	(*destroy_queue)(struct device_queue_manager *dqm,
 				struct qcm_process_device *qpd,
@@ -138,12 +139,13 @@ struct device_queue_manager_ops {
 				  u32 *save_area_used_size);
 
 	void	(*get_queue_dump_info)(struct device_queue_manager *dqm,
-				  const struct queue *q,
-				  u32 *mqd_size);
+				  const struct queue *q, u32 *mqd_size,
+				  u32 *ctl_stack_size);
 
 	int	(*dump_mqd)(struct device_queue_manager *dqm,
 				  const struct queue *q,
-				  void *mqd);
+				  void *mqd,
+				  void *ctl_stack);
 };
 
 struct device_queue_manager_asic_ops {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
index 497e6f874352..bb91b95b4970 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.h
@@ -98,12 +98,16 @@ struct mqd_manager {
 				  u32 *ctl_stack_used_size,
 				  u32 *save_area_used_size);
 
-	void	(*dump_mqd)(struct mqd_manager *mm, void *mqd, void *mqd_dst);
+	void	(*get_dump_info)(struct mqd_manager *mm, void *mqd, uint32_t *ctl_stack_size);
+
+	void	(*dump_mqd)(struct mqd_manager *mm, void *mqd, void *mqd_dst, void *ctl_stack_dst);
 
 	void	(*restore_mqd)(struct mqd_manager *mm, void **mqd,
 				struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 				struct queue_properties *p,
-				const void *mqd_src);
+				const void *mqd_src,
+				const void *ctl_stack_src,
+				const u32 ctl_stack_size);
 
 #if defined(CONFIG_DEBUG_FS)
 	int	(*debugfs_show_mqd)(struct seq_file *m, void *data);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index 1d000252080c..bf32c67b723a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -275,7 +275,13 @@ static int destroy_mqd(struct mqd_manager *mm, void *mqd,
 					pipe_id, queue_id);
 }
 
-static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+static void get_dump_info(struct mqd_manager *mm, void *mqd, u32 *ctl_stack_size)
+{
+	/* Control stack is stored in user mode */
+	*ctl_stack_size = 0;
+}
+
+static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst, void *ctl_stack_dst)
 {
 	struct cik_mqd *m;
 
@@ -287,7 +293,8 @@ static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
 static void restore_mqd(struct mqd_manager *mm, void **mqd,
 			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 			struct queue_properties *qp,
-			const void *mqd_src)
+			const void *mqd_src,
+			const void *ctl_stack_src, const u32 ctl_stack_size)
 {
 	uint64_t addr;
 	struct cik_mqd *m;
@@ -309,7 +316,7 @@ static void restore_mqd(struct mqd_manager *mm, void **mqd,
 	qp->is_active = 0;
 }
 
-static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst, void *ctl_stack_dst)
 {
 	struct cik_sdma_rlc_registers *m;
 
@@ -321,7 +328,8 @@ static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
 static void restore_mqd_sdma(struct mqd_manager *mm, void **mqd,
 				struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 				struct queue_properties *qp,
-				const void *mqd_src)
+				const void *mqd_src,
+				const void *ctl_stack_src, const u32 ctl_stack_size)
 {
 	uint64_t addr;
 	struct cik_sdma_rlc_registers *m;
@@ -451,6 +459,7 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 		mqd->update_mqd = update_mqd;
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
+		mqd->get_dump_info = get_dump_info;
 		mqd->dump_mqd = dump_mqd;
 		mqd->restore_mqd = restore_mqd;
 		mqd->mqd_size = sizeof(struct cik_mqd);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
index 0066a2cf5672..6dd06ee98b60 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c
@@ -283,7 +283,13 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
 	return 0;
 }
 
-static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+static void get_dump_info(struct mqd_manager *mm, void *mqd, u32 *ctl_stack_size)
+{
+	/* Control stack is stored in user mode */
+	*ctl_stack_size = 0;
+}
+
+static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst, void *ctl_stack_dst)
 {
 	struct v10_compute_mqd *m;
 
@@ -295,7 +301,8 @@ static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
 static void restore_mqd(struct mqd_manager *mm, void **mqd,
 			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 			struct queue_properties *qp,
-			const void *mqd_src)
+			const void *mqd_src,
+			const void *ctl_stack_src, const u32 ctl_stack_size)
 {
 	uint64_t addr;
 	struct v10_compute_mqd *m;
@@ -405,7 +412,7 @@ static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
 	return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
 }
 
-static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst, void *ctl_stack_dst)
 {
 	struct v10_sdma_mqd *m;
 
@@ -417,7 +424,9 @@ static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
 static void restore_mqd_sdma(struct mqd_manager *mm, void **mqd,
 			     struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 			     struct queue_properties *qp,
-			     const void *mqd_src)
+			     const void *mqd_src,
+			     const void *ctl_stack_src,
+			     const u32 ctl_stack_size)
 {
 	uint64_t addr;
 	struct v10_sdma_mqd *m;
@@ -478,6 +487,7 @@ struct mqd_manager *mqd_manager_init_v10(enum KFD_MQD_TYPE type,
 		mqd->is_occupied = is_occupied;
 		mqd->mqd_size = sizeof(struct v10_compute_mqd);
 		mqd->get_wave_state = get_wave_state;
+		mqd->get_dump_info = get_dump_info;
 		mqd->dump_mqd = dump_mqd;
 		mqd->restore_mqd = restore_mqd;
 #if defined(CONFIG_DEBUG_FS)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
index 5b6beb69dfc2..db9f138e1135 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -338,21 +338,34 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
 	return 0;
 }
 
-static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+static void get_dump_info(struct mqd_manager *mm, void *mqd, u32 *ctl_stack_size)
+{
+	struct v9_mqd *m = get_mqd(mqd);
+
+	*ctl_stack_size = m->cp_hqd_cntl_stack_size;
+}
+
+static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst, void *ctl_stack_dst)
 {
 	struct v9_mqd *m;
+	/* Control stack is located one page after MQD. */
+	void *ctl_stack = (void *)((uintptr_t)mqd + PAGE_SIZE);
 
 	m = get_mqd(mqd);
 
 	memcpy(mqd_dst, m, sizeof(struct v9_mqd));
+	memcpy(ctl_stack_dst, ctl_stack, m->cp_hqd_cntl_stack_size);
 }
 
 static void restore_mqd(struct mqd_manager *mm, void **mqd,
 			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
-			struct queue_properties *qp, const void *mqd_src)
+			struct queue_properties *qp,
+			const void *mqd_src,
+			const void *ctl_stack_src, u32 ctl_stack_size)
 {
 	uint64_t addr;
 	struct v9_mqd *m;
+	void *ctl_stack;
 
 	m = (struct v9_mqd *) mqd_mem_obj->cpu_ptr;
 	addr = mqd_mem_obj->gpu_addr;
@@ -364,6 +377,9 @@ static void restore_mqd(struct mqd_manager *mm, void **mqd,
 		*gart_addr = addr;
 
 	/* Control stack is located one page after MQD. */
+	ctl_stack = (void *)((uintptr_t)*mqd + PAGE_SIZE);
+	memcpy(ctl_stack, ctl_stack_src, ctl_stack_size);
+
 	m->cp_hqd_pq_doorbell_control =
 		qp->doorbell_off <<
 			CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
@@ -460,7 +476,7 @@ static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
 	return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
 }
 
-static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst, void *ctl_stack_dst)
 {
 	struct v9_sdma_mqd *m;
 
@@ -471,7 +487,9 @@ static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
 
 static void restore_mqd_sdma(struct mqd_manager *mm, void **mqd,
 			     struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
-			     struct queue_properties *qp, const void *mqd_src)
+			     struct queue_properties *qp,
+			     const void *mqd_src,
+			     const void *ctl_stack_src, const u32 ctl_stack_size)
 {
 	uint64_t addr;
 	struct v9_sdma_mqd *m;
@@ -530,6 +548,7 @@ struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
 		mqd->get_wave_state = get_wave_state;
+		mqd->get_dump_info = get_dump_info;
 		mqd->dump_mqd = dump_mqd;
 		mqd->restore_mqd = restore_mqd;
 		mqd->mqd_size = sizeof(struct v9_mqd);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index ae5e3edec92e..88f320abe850 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -303,7 +303,13 @@ static int get_wave_state(struct mqd_manager *mm, void *mqd,
 	return 0;
 }
 
-static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+static void get_dump_info(struct mqd_manager *mm, void *mqd, u32 *ctl_stack_size)
+{
+	/* Control stack is stored in user mode */
+	*ctl_stack_size = 0;
+}
+
+static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst, void *ctl_stack_dst)
 {
 	struct vi_mqd *m;
 
@@ -315,7 +321,8 @@ static void dump_mqd(struct mqd_manager *mm, void *mqd, void *mqd_dst)
 static void restore_mqd(struct mqd_manager *mm, void **mqd,
 			struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 			struct queue_properties *qp,
-			const void *mqd_src)
+			const void *mqd_src,
+			const void *ctl_stack_src, const u32 ctl_stack_size)
 {
 	uint64_t addr;
 	struct vi_mqd *m;
@@ -429,7 +436,7 @@ static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
 	return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
 }
 
-static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
+static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst, void *ctl_stack_dst)
 {
 	struct vi_sdma_mqd *m;
 
@@ -441,7 +448,8 @@ static void dump_mqd_sdma(struct mqd_manager *mm, void *mqd, void *mqd_dst)
 static void restore_mqd_sdma(struct mqd_manager *mm, void **mqd,
 			     struct kfd_mem_obj *mqd_mem_obj, uint64_t *gart_addr,
 			     struct queue_properties *qp,
-			     const void *mqd_src)
+			     const void *mqd_src,
+			     const void *ctl_stack_src, const u32 ctl_stack_size)
 {
 	uint64_t addr;
 	struct vi_sdma_mqd *m;
@@ -500,6 +508,7 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		mqd->destroy_mqd = destroy_mqd;
 		mqd->is_occupied = is_occupied;
 		mqd->get_wave_state = get_wave_state;
+		mqd->get_dump_info = get_dump_info;
 		mqd->dump_mqd = dump_mqd;
 		mqd->restore_mqd = restore_mqd;
 		mqd->mqd_size = sizeof(struct vi_mqd);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 5d9efcc63208..7ed6f831109d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1117,6 +1117,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			    unsigned int *qid,
 			    const struct kfd_criu_queue_priv_data *q_data,
 			    const void *restore_mqd,
+			    const void *restore_ctl_stack,
 			    uint32_t *p_doorbell_offset_in_process);
 int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
 int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid,
@@ -1139,11 +1140,12 @@ int amdkfd_fence_wait_timeout(uint64_t *fence_addr,
 			      uint64_t fence_value,
 			      unsigned int timeout_ms);
 int pqm_get_queue_dump_info(struct process_queue_manager *pqm,
-			unsigned int qid,
-			u32 *mqd_size);
+			    unsigned int qid,
+			    u32 *mqd_size, u32 *ctl_stack_size);
 int pqm_dump_mqd(struct process_queue_manager *pqm,
 		       unsigned int qid,
-		       void *dst_mqd);
+		       void *dst_mqd,
+		       void *dst_ctl_stack);
 
 /* Packet Manager */
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index a7eb1c6a700f..a4f757efc4e5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -209,6 +209,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			    unsigned int *qid,
 			    const struct kfd_criu_queue_priv_data *q_data,
 			    const void *restore_mqd,
+			    const void *restore_ctl_stack,
 			    uint32_t *p_doorbell_offset_in_process)
 {
 	int retval;
@@ -273,7 +274,8 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			goto err_create_queue;
 		pqn->q = q;
 		pqn->kq = NULL;
-		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data, restore_mqd);
+		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data,
+						    restore_mqd, restore_ctl_stack);
 		print_queue(q);
 		break;
 
@@ -293,7 +295,8 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 			goto err_create_queue;
 		pqn->q = q;
 		pqn->kq = NULL;
-		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data, restore_mqd);
+		retval = dev->dqm->ops.create_queue(dev->dqm, q, &pdd->qpd, q_data,
+						    restore_mqd, restore_ctl_stack);
 		print_queue(q);
 		break;
 	case KFD_QUEUE_TYPE_DIQ:
@@ -528,9 +531,8 @@ int pqm_get_wave_state(struct process_queue_manager *pqm,
 						       save_area_used_size);
 }
 
-int pqm_get_queue_dump_info(struct process_queue_manager *pqm,
-			unsigned int qid,
-			u32 *mqd_size)
+int pqm_get_queue_dump_info(struct process_queue_manager *pqm, unsigned int qid,
+			u32 *mqd_size, u32 *ctl_stack_size)
 {
 	struct process_queue_node *pqn;
 
@@ -546,12 +548,13 @@ int pqm_get_queue_dump_info(struct process_queue_manager *pqm,
 	}
 
 	pqn->q->device->dqm->ops.get_queue_dump_info(pqn->q->device->dqm,
-						       pqn->q, mqd_size);
+						       pqn->q, mqd_size,
+						       ctl_stack_size);
 	return 0;
 }
 
 int pqm_dump_mqd(struct process_queue_manager *pqm,
-		       unsigned int qid, void *mqd)
+		       unsigned int qid, void *mqd, void *ctl_stack)
 {
 	struct process_queue_node *pqn;
 
@@ -567,7 +570,7 @@ int pqm_dump_mqd(struct process_queue_manager *pqm,
 	}
 
 	return pqn->q->device->dqm->ops.dump_mqd(pqn->q->device->dqm,
-						       pqn->q, mqd);
+						       pqn->q, mqd, ctl_stack);
 }
 
 #if defined(CONFIG_DEBUG_FS)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/18] drm/amdkfd: CRIU dump and restore events
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (13 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 14/18] drm/amdkfd: CRIU dump/restore queue control stack David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-23 18:39   ` Felix Kuehling
  2021-08-19 13:37 ` [PATCH 16/18] drm/amdkfd: CRIU implement gpu_id remapping David Yat Sin
                   ` (2 subsequent siblings)
  17 siblings, 1 reply; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 130 +++++++-----
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  | 253 ++++++++++++++++++++---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  25 ++-
 3 files changed, 329 insertions(+), 79 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 19f16e3dd769..c8f523d8ab81 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1008,51 +1008,11 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 	 * through the event_page_offset field.
 	 */
 	if (args->event_page_offset) {
-		struct kfd_dev *kfd;
-		struct kfd_process_device *pdd;
-		void *mem, *kern_addr;
-		uint64_t size;
-
-		if (p->signal_page) {
-			pr_err("Event page is already set\n");
-			return -EINVAL;
-		}
-
-		kfd = kfd_device_by_id(GET_GPU_ID(args->event_page_offset));
-		if (!kfd) {
-			pr_err("Getting device by id failed in %s\n", __func__);
-			return -EINVAL;
-		}
-
 		mutex_lock(&p->mutex);
-		pdd = kfd_bind_process_to_device(kfd, p);
-		if (IS_ERR(pdd)) {
-			err = PTR_ERR(pdd);
-			goto out_unlock;
-		}
-
-		mem = kfd_process_device_translate_handle(pdd,
-				GET_IDR_HANDLE(args->event_page_offset));
-		if (!mem) {
-			pr_err("Can't find BO, offset is 0x%llx\n",
-			       args->event_page_offset);
-			err = -EINVAL;
-			goto out_unlock;
-		}
+		err = kfd_kmap_event_page(p, args->event_page_offset);
 		mutex_unlock(&p->mutex);
-
-		err = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(kfd->kgd,
-						mem, &kern_addr, &size);
-		if (err) {
-			pr_err("Failed to map event page to kernel\n");
-			return err;
-		}
-
-		err = kfd_event_page_set(p, kern_addr, size);
-		if (err) {
-			pr_err("Failed to set event page\n");
+		if (err)
 			return err;
-		}
 	}
 
 	err = kfd_event_create(filp, p, args->event_type,
@@ -1061,10 +1021,7 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 				&args->event_page_offset,
 				&args->event_slot_index);
 
-	return err;
-
-out_unlock:
-	mutex_unlock(&p->mutex);
+	pr_debug("Created event (id:0x%08x) (%s)\n", args->event_id, __func__);
 	return err;
 }
 
@@ -2208,6 +2165,41 @@ static int criu_dump_queues(struct kfd_process *p, struct kfd_ioctl_criu_dumper_
 	return ret;
 }
 
+static int criu_dump_events(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
+{
+	struct kfd_criu_event_bucket *ev_buckets;
+	uint32_t num_events;
+	int ret =  0;
+
+	num_events = kfd_get_num_events(p);
+	if (args->num_objects != num_events) {
+		pr_err("Mismatch with number of events (current:%d user:%lld)\n",
+							num_events, args->num_objects);
+
+	}
+
+	if (args->objects_size != args->num_objects *
+				  (sizeof(*ev_buckets) + sizeof(struct kfd_criu_event_priv_data))) {
+		pr_err("Invalid objects size for events\n");
+		return -EINVAL;
+	}
+
+	ev_buckets = kvzalloc(args->objects_size, GFP_KERNEL);
+	if (!ev_buckets)
+		return -ENOMEM;
+
+	ret = kfd_event_dump(p, ev_buckets, args->num_objects);
+	if (!ret) {
+		ret = copy_to_user((void __user *)args->objects, ev_buckets, args->objects_size);
+		if (ret) {
+			pr_err("Failed to copy events information to user\n");
+			ret = -EFAULT;
+		}
+	}
+	kvfree(ev_buckets);
+	return ret;
+}
+
 static int kfd_ioctl_criu_dumper(struct file *filep,
 				struct kfd_process *p, void *data)
 {
@@ -2246,6 +2238,8 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
 		ret = criu_dump_queues(p, args);
 		break;
 	case KFD_CRIU_OBJECT_TYPE_EVENT:
+		ret = criu_dump_events(p, args);
+		break;
 	case KFD_CRIU_OBJECT_TYPE_DEVICE:
 	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
 	default:
@@ -2676,6 +2670,40 @@ static int criu_restore_queues(struct kfd_process *p,
 	return ret;
 }
 
+static int criu_restore_events(struct file *filp, struct kfd_process *p,
+			struct kfd_ioctl_criu_restorer_args *args)
+{
+	int ret = 0, i;
+	uint8_t *objects, *private_data;
+	struct kfd_criu_event_bucket *ev_buckets;
+
+	objects = kvzalloc(args->objects_size, GFP_KERNEL);
+	if (!objects)
+		return -ENOMEM;
+
+	ret = copy_from_user(objects, (void __user *)args->objects, args->objects_size);
+	if (ret) {
+		pr_err("Failed to copy event information from user\n");
+		ret = -EFAULT;
+		goto exit;
+	}
+
+	ev_buckets = (struct kfd_criu_event_bucket *) objects;
+	private_data = (void *)(ev_buckets + args->num_objects);
+
+	for (i = 0; i < args->num_objects; i++) {
+		ret = kfd_event_restore(filp, p, &ev_buckets[i], private_data);
+		if (ret) {
+			pr_err("Failed to restore event (%d)\n", ret);
+			goto exit;
+		}
+	}
+
+exit:
+	kvfree(ev_buckets);
+	return ret;
+}
+
 static int kfd_ioctl_criu_restorer(struct file *filep,
 				struct kfd_process *p, void *data)
 {
@@ -2698,6 +2726,8 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
 		ret = criu_restore_queues(p, args);
 		break;
 	case KFD_CRIU_OBJECT_TYPE_EVENT:
+		ret = criu_restore_events(filep, p, args);
+		break;
 	case KFD_CRIU_OBJECT_TYPE_DEVICE:
 	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
 	default:
@@ -2799,9 +2829,13 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
 	args->queues_priv_data_size = queues_extra_data_size +
 				(args->total_queues * sizeof(struct kfd_criu_queue_priv_data));
 
-	dev_dbg(kfd_device, "Num of bos:%llu queues:%u\n",
+	args->total_events = kfd_get_num_events(p);
+	args->events_priv_data_size = args->total_events * sizeof(struct kfd_criu_event_priv_data);
+
+	dev_dbg(kfd_device, "Num of bos:%llu queues:%u events:%u\n",
 				args->total_bos,
-				args->total_queues);
+				args->total_queues,
+				args->total_events);
 err_unlock:
 	mutex_unlock(&p->mutex);
 	return ret;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index ba2c2ce0c55a..18362478e351 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -53,9 +53,9 @@ struct kfd_signal_page {
 	uint64_t *kernel_address;
 	uint64_t __user *user_address;
 	bool need_to_free_pages;
+	uint64_t user_handle; /* Needed for CRIU dumped and restore */
 };
 
-
 static uint64_t *page_slots(struct kfd_signal_page *page)
 {
 	return page->kernel_address;
@@ -92,7 +92,8 @@ static struct kfd_signal_page *allocate_signal_page(struct kfd_process *p)
 }
 
 static int allocate_event_notification_slot(struct kfd_process *p,
-					    struct kfd_event *ev)
+					    struct kfd_event *ev,
+					    const int *restore_id)
 {
 	int id;
 
@@ -104,14 +105,19 @@ static int allocate_event_notification_slot(struct kfd_process *p,
 		p->signal_mapped_size = 256*8;
 	}
 
-	/*
-	 * Compatibility with old user mode: Only use signal slots
-	 * user mode has mapped, may be less than
-	 * KFD_SIGNAL_EVENT_LIMIT. This also allows future increase
-	 * of the event limit without breaking user mode.
-	 */
-	id = idr_alloc(&p->event_idr, ev, 0, p->signal_mapped_size / 8,
-		       GFP_KERNEL);
+	if (restore_id) {
+		id = idr_alloc(&p->event_idr, ev, *restore_id, *restore_id + 1,
+				GFP_KERNEL);
+	} else {
+		/*
+		 * Compatibility with old user mode: Only use signal slots
+		 * user mode has mapped, may be less than
+		 * KFD_SIGNAL_EVENT_LIMIT. This also allows future increase
+		 * of the event limit without breaking user mode.
+		 */
+		id = idr_alloc(&p->event_idr, ev, 0, p->signal_mapped_size / 8,
+				GFP_KERNEL);
+	}
 	if (id < 0)
 		return id;
 
@@ -178,9 +184,8 @@ static struct kfd_event *lookup_signaled_event_by_partial_id(
 	return ev;
 }
 
-static int create_signal_event(struct file *devkfd,
-				struct kfd_process *p,
-				struct kfd_event *ev)
+static int create_signal_event(struct file *devkfd, struct kfd_process *p,
+				struct kfd_event *ev, const int *restore_id)
 {
 	int ret;
 
@@ -193,7 +198,7 @@ static int create_signal_event(struct file *devkfd,
 		return -ENOSPC;
 	}
 
-	ret = allocate_event_notification_slot(p, ev);
+	ret = allocate_event_notification_slot(p, ev, restore_id);
 	if (ret) {
 		pr_warn("Signal event wasn't created because out of kernel memory\n");
 		return ret;
@@ -209,16 +214,22 @@ static int create_signal_event(struct file *devkfd,
 	return 0;
 }
 
-static int create_other_event(struct kfd_process *p, struct kfd_event *ev)
+static int create_other_event(struct kfd_process *p, struct kfd_event *ev, const int *restore_id)
 {
-	/* Cast KFD_LAST_NONSIGNAL_EVENT to uint32_t. This allows an
-	 * intentional integer overflow to -1 without a compiler
-	 * warning. idr_alloc treats a negative value as "maximum
-	 * signed integer".
-	 */
-	int id = idr_alloc(&p->event_idr, ev, KFD_FIRST_NONSIGNAL_EVENT_ID,
-			   (uint32_t)KFD_LAST_NONSIGNAL_EVENT_ID + 1,
-			   GFP_KERNEL);
+	int id;
+
+	if (restore_id)
+		id = idr_alloc(&p->event_idr, ev, *restore_id, *restore_id + 1,
+			GFP_KERNEL);
+	else
+		/* Cast KFD_LAST_NONSIGNAL_EVENT to uint32_t. This allows an
+		 * intentional integer overflow to -1 without a compiler
+		 * warning. idr_alloc treats a negative value as "maximum
+		 * signed integer".
+		 */
+		id = idr_alloc(&p->event_idr, ev, KFD_FIRST_NONSIGNAL_EVENT_ID,
+				(uint32_t)KFD_LAST_NONSIGNAL_EVENT_ID + 1,
+				GFP_KERNEL);
 
 	if (id < 0)
 		return id;
@@ -295,8 +306,8 @@ static bool event_can_be_cpu_signaled(const struct kfd_event *ev)
 	return ev->type == KFD_EVENT_TYPE_SIGNAL;
 }
 
-int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
-		       uint64_t size)
+static int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
+		       uint64_t size, uint64_t user_handle)
 {
 	struct kfd_signal_page *page;
 
@@ -315,10 +326,55 @@ int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
 
 	p->signal_page = page;
 	p->signal_mapped_size = size;
-
+	p->signal_page->user_handle = user_handle;
 	return 0;
 }
 
+int kfd_kmap_event_page(struct kfd_process *p, uint64_t event_page_offset)
+{
+	struct kfd_dev *kfd;
+	struct kfd_process_device *pdd;
+	void *mem, *kern_addr;
+	uint64_t size;
+	int err = 0;
+
+	if (p->signal_page) {
+		pr_err("Event page is already set\n");
+		return -EINVAL;
+	}
+
+	kfd = kfd_device_by_id(GET_GPU_ID(event_page_offset));
+	if (!kfd) {
+		pr_err("Getting device by id failed in %s\n", __func__);
+		return -EINVAL;
+	}
+
+	pdd = kfd_bind_process_to_device(kfd, p);
+	if (IS_ERR(pdd))
+		return PTR_ERR(pdd);
+
+	mem = kfd_process_device_translate_handle(pdd,
+			GET_IDR_HANDLE(event_page_offset));
+	if (!mem) {
+		pr_err("Can't find BO, offset is 0x%llx\n", event_page_offset);
+		return -EINVAL;
+	}
+
+	err = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(kfd->kgd,
+					mem, &kern_addr, &size);
+	if (err) {
+		pr_err("Failed to map event page to kernel\n");
+		return err;
+	}
+
+	err = kfd_event_page_set(p, kern_addr, size, event_page_offset);
+	if (err) {
+		pr_err("Failed to set event page\n");
+		return err;
+	}
+	return err;
+}
+
 int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint32_t event_type, bool auto_reset, uint32_t node_id,
 		     uint32_t *event_id, uint32_t *event_trigger_data,
@@ -343,14 +399,14 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 	switch (event_type) {
 	case KFD_EVENT_TYPE_SIGNAL:
 	case KFD_EVENT_TYPE_DEBUG:
-		ret = create_signal_event(devkfd, p, ev);
+		ret = create_signal_event(devkfd, p, ev, NULL);
 		if (!ret) {
 			*event_page_offset = KFD_MMAP_TYPE_EVENTS;
 			*event_slot_index = ev->event_id;
 		}
 		break;
 	default:
-		ret = create_other_event(p, ev);
+		ret = create_other_event(p, ev, NULL);
 		break;
 	}
 
@@ -366,6 +422,147 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 	return ret;
 }
 
+int kfd_event_restore(struct file *devkfd, struct kfd_process *p,
+		      struct kfd_criu_event_bucket *ev_bucket,
+		      uint8_t *priv_datas)
+{
+	int ret = 0;
+	struct kfd_criu_event_priv_data *ev_priv;
+	struct kfd_event *ev;
+
+	ev_priv = (struct kfd_criu_event_priv_data *)(priv_datas + ev_bucket->priv_data_offset);
+
+	if (ev_priv->user_handle) {
+		ret = kfd_kmap_event_page(p, ev_priv->user_handle);
+		if (ret)
+			return ret;
+	}
+
+	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+	if (!ev)
+		return -ENOMEM;
+
+	ev->type = ev_priv->type;
+	ev->auto_reset = ev_priv->auto_reset;
+	ev->signaled = ev_priv->signaled;
+
+	init_waitqueue_head(&ev->wq);
+
+	mutex_lock(&p->event_mutex);
+	switch (ev->type) {
+	case KFD_EVENT_TYPE_SIGNAL:
+	case KFD_EVENT_TYPE_DEBUG:
+		ret = create_signal_event(devkfd, p, ev, &ev_priv->event_id);
+		break;
+	case KFD_EVENT_TYPE_MEMORY:
+		memcpy(&ev->memory_exception_data,
+			&ev_priv->memory_exception_data,
+			sizeof(struct kfd_hsa_memory_exception_data));
+
+		ev->memory_exception_data.gpu_id = ev_bucket->gpu_id;
+		ret = create_other_event(p, ev, &ev_priv->event_id);
+		break;
+	case KFD_EVENT_TYPE_HW_EXCEPTION:
+		memcpy(&ev->hw_exception_data,
+			&ev_priv->hw_exception_data,
+			sizeof(struct kfd_hsa_hw_exception_data));
+
+		ev->hw_exception_data.gpu_id = ev_bucket->gpu_id;
+		ret = create_other_event(p, ev, &ev_priv->event_id);
+		break;
+	}
+
+	if (ret)
+		kfree(ev);
+
+	mutex_unlock(&p->event_mutex);
+
+	return ret;
+}
+
+int kfd_event_dump(struct kfd_process *p,
+		   struct kfd_criu_event_bucket *ev_buckets,
+		   uint32_t num_events)
+{
+	struct kfd_event *ev;
+	struct kfd_criu_event_priv_data *ev_privs;
+	uint32_t ev_id;
+	int i = 0;
+
+	/* Private data for first event starts after all ev_buckets */
+	ev_privs = (struct kfd_criu_event_priv_data *)((uint8_t *)ev_buckets +
+						   (num_events * (sizeof(*ev_buckets))));
+
+
+	idr_for_each_entry(&p->event_idr, ev, ev_id) {
+		struct kfd_criu_event_bucket *ev_bucket;
+		struct kfd_criu_event_priv_data *ev_priv;
+
+		if (i >= num_events) {
+			pr_err("Number of events exceeds number allocated\n");
+			return -ENOMEM;
+		}
+
+		ev_bucket = &ev_buckets[i];
+
+		/* Currently, all events have same size of private_data, but the current ioctl's
+		 * and CRIU plugin supports private_data of variable sizes
+		 */
+		ev_priv = &ev_privs[i];
+
+		ev_bucket->priv_data_offset = i * sizeof(*ev_priv);
+		ev_bucket->priv_data_size = sizeof(*ev_priv);
+
+		/* We store the user_handle with the first event */
+		if (i == 0 && p->signal_page)
+			ev_priv->user_handle = p->signal_page->user_handle;
+
+		ev_priv->event_id = ev->event_id;
+		ev_priv->auto_reset = ev->auto_reset;
+		ev_priv->type = ev->type;
+		ev_priv->signaled = ev->signaled;
+
+		/* We store the gpu_id in the bucket section so that the userspace CRIU plugin can
+		 * modify it if needed.
+		 */
+		if (ev_priv->type == KFD_EVENT_TYPE_MEMORY) {
+			memcpy(&ev_priv->memory_exception_data,
+				&ev->memory_exception_data,
+				sizeof(struct kfd_hsa_memory_exception_data));
+
+			ev_bucket->gpu_id = ev_priv->memory_exception_data.gpu_id;
+		} else if (ev_priv->type == KFD_EVENT_TYPE_HW_EXCEPTION) {
+			memcpy(&ev_priv->hw_exception_data,
+				&ev->hw_exception_data,
+				sizeof(struct kfd_hsa_hw_exception_data));
+
+			ev_bucket->gpu_id = ev_priv->hw_exception_data.gpu_id;
+		} else
+			ev_bucket->gpu_id = 0;
+
+		pr_debug("Dumped event[%d] id = 0x%08x auto_reset = %x type = %x signaled = %x\n",
+			  i,
+			  ev_priv->event_id,
+			  ev_priv->auto_reset,
+			  ev_priv->type,
+			  ev_priv->signaled);
+		i++;
+	}
+	return 0;
+}
+
+int kfd_get_num_events(struct kfd_process *p)
+{
+	struct kfd_event *ev;
+	uint32_t id;
+	u32 num_events = 0;
+
+	idr_for_each_entry(&p->event_idr, ev, id)
+		num_events++;
+
+	return num_events++;
+}
+
 /* Assumes that p is current. */
 int kfd_event_destroy(struct kfd_process *p, uint32_t event_id)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 7ed6f831109d..bf10a5305ef7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1069,9 +1069,26 @@ struct kfd_criu_queue_priv_data {
 };
 
 struct kfd_criu_event_priv_data {
-	uint64_t reserved;
+	uint64_t user_handle;
+	uint32_t event_id;
+	uint32_t auto_reset;
+	uint32_t type;
+	uint32_t signaled;
+
+	union {
+		struct kfd_hsa_memory_exception_data memory_exception_data;
+		struct kfd_hsa_hw_exception_data hw_exception_data;
+	};
 };
 
+int kfd_event_restore(struct file *devkfd, struct kfd_process *p,
+		      struct kfd_criu_event_bucket *ev_bucket,
+		      uint8_t *priv_datas);
+
+int kfd_event_dump(struct kfd_process *p,
+		   struct kfd_criu_event_bucket *ev_buckets,
+		   uint32_t num_events);
+
 /* CRIU - End */
 
 /* Queue Context Management */
@@ -1238,12 +1255,14 @@ void kfd_signal_iommu_event(struct kfd_dev *dev,
 void kfd_signal_hw_exception_event(u32 pasid);
 int kfd_set_event(struct kfd_process *p, uint32_t event_id);
 int kfd_reset_event(struct kfd_process *p, uint32_t event_id);
-int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
-		       uint64_t size);
+int kfd_kmap_event_page(struct kfd_process *p, uint64_t event_page_offset);
+
 int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint32_t event_type, bool auto_reset, uint32_t node_id,
 		     uint32_t *event_id, uint32_t *event_trigger_data,
 		     uint64_t *event_page_offset, uint32_t *event_slot_index);
+
+int kfd_get_num_events(struct kfd_process *p);
 int kfd_event_destroy(struct kfd_process *p, uint32_t event_id);
 
 void kfd_signal_vm_fault_event(struct kfd_dev *dev, u32 pasid,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/18] drm/amdkfd: CRIU implement gpu_id remapping
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (14 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 15/18] drm/amdkfd: CRIU dump and restore events David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-23 18:48   ` Felix Kuehling
  2021-08-19 13:37 ` [PATCH 17/18] Revert "drm/amdgpu: Remove verify_access shortcut for KFD BOs" David Yat Sin
  2021-08-19 13:37 ` [PATCH 18/18] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects David Yat Sin
  17 siblings, 1 reply; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

When doing a restore on a different node, the gpu_id's on the restore
node may be different. But the user space application will still refer
use the original gpu_id's in the ioctl calls. Adding code to create a
gpu id mapping so that kfd can determine actual gpu_id during the user
ioctl's.

Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 400 +++++++++++++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  |   5 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  10 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  18 +
 4 files changed, 324 insertions(+), 109 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index c8f523d8ab81..90e4d4ce4398 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -294,13 +294,14 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 		return err;
 
 	pr_debug("Looking for gpu id 0x%x\n", args->gpu_id);
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev) {
+
+	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
 		pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
 		return -EINVAL;
 	}
-
-	mutex_lock(&p->mutex);
+	dev = pdd->dev;
 
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
@@ -491,7 +492,6 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
 					struct kfd_process *p, void *data)
 {
 	struct kfd_ioctl_set_memory_policy_args *args = data;
-	struct kfd_dev *dev;
 	int err = 0;
 	struct kfd_process_device *pdd;
 	enum cache_policy default_policy, alternate_policy;
@@ -506,13 +506,15 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
 		return -EINVAL;
 	}
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
-		return -EINVAL;
-
 	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
+		pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
+		err = -EINVAL;
+		goto out;
+	}
 
-	pdd = kfd_bind_process_to_device(dev, p);
+	pdd = kfd_bind_process_to_device(pdd->dev, p);
 	if (IS_ERR(pdd)) {
 		err = -ESRCH;
 		goto out;
@@ -525,7 +527,7 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
 		(args->alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
 		   ? cache_policy_coherent : cache_policy_noncoherent;
 
-	if (!dev->dqm->ops.set_cache_memory_policy(dev->dqm,
+	if (!pdd->dev->dqm->ops.set_cache_memory_policy(pdd->dev->dqm,
 				&pdd->qpd,
 				default_policy,
 				alternate_policy,
@@ -543,17 +545,18 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
 					struct kfd_process *p, void *data)
 {
 	struct kfd_ioctl_set_trap_handler_args *args = data;
-	struct kfd_dev *dev;
 	int err = 0;
 	struct kfd_process_device *pdd;
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
-		return -EINVAL;
-
 	mutex_lock(&p->mutex);
 
-	pdd = kfd_bind_process_to_device(dev, p);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
+		err = -EINVAL;
+		goto out;
+	}
+
+	pdd = kfd_bind_process_to_device(pdd->dev, p);
 	if (IS_ERR(pdd)) {
 		err = -ESRCH;
 		goto out;
@@ -577,16 +580,20 @@ static int kfd_ioctl_dbg_register(struct file *filep,
 	bool create_ok;
 	long status = 0;
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
-		return -EINVAL;
+	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
+		status = -EINVAL;
+		goto out_unlock_p;
+	}
+	dev = pdd->dev;
 
 	if (dev->device_info->asic_family == CHIP_CARRIZO) {
 		pr_debug("kfd_ioctl_dbg_register not supported on CZ\n");
-		return -EINVAL;
+		status = -EINVAL;
+		goto out_unlock_p;
 	}
 
-	mutex_lock(&p->mutex);
 	mutex_lock(kfd_get_dbgmgr_mutex());
 
 	/*
@@ -596,7 +603,7 @@ static int kfd_ioctl_dbg_register(struct file *filep,
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
 		status = PTR_ERR(pdd);
-		goto out;
+		goto out_unlock_dbg;
 	}
 
 	if (!dev->dbgmgr) {
@@ -614,8 +621,9 @@ static int kfd_ioctl_dbg_register(struct file *filep,
 		status = -EINVAL;
 	}
 
-out:
+out_unlock_dbg:
 	mutex_unlock(kfd_get_dbgmgr_mutex());
+out_unlock_p:
 	mutex_unlock(&p->mutex);
 
 	return status;
@@ -625,12 +633,18 @@ static int kfd_ioctl_dbg_unregister(struct file *filep,
 				struct kfd_process *p, void *data)
 {
 	struct kfd_ioctl_dbg_unregister_args *args = data;
+	struct kfd_process_device *pdd;
 	struct kfd_dev *dev;
 	long status;
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev || !dev->dbgmgr)
+	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd || !pdd->dev->dbgmgr) {
+		mutex_unlock(&p->mutex);
 		return -EINVAL;
+	}
+	dev = pdd->dev;
+	mutex_unlock(&p->mutex);
 
 	if (dev->device_info->asic_family == CHIP_CARRIZO) {
 		pr_debug("kfd_ioctl_dbg_unregister not supported on CZ\n");
@@ -664,6 +678,7 @@ static int kfd_ioctl_dbg_address_watch(struct file *filep,
 {
 	struct kfd_ioctl_dbg_address_watch_args *args = data;
 	struct kfd_dev *dev;
+	struct kfd_process_device *pdd;
 	struct dbg_address_watch_info aw_info;
 	unsigned char *args_buff;
 	long status;
@@ -673,9 +688,15 @@ static int kfd_ioctl_dbg_address_watch(struct file *filep,
 
 	memset((void *) &aw_info, 0, sizeof(struct dbg_address_watch_info));
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
+	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
+		mutex_unlock(&p->mutex);
+		pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
 		return -EINVAL;
+	}
+	dev = pdd->dev;
+	mutex_unlock(&p->mutex);
 
 	if (dev->device_info->asic_family == CHIP_CARRIZO) {
 		pr_debug("kfd_ioctl_dbg_wave_control not supported on CZ\n");
@@ -764,6 +785,7 @@ static int kfd_ioctl_dbg_wave_control(struct file *filep,
 {
 	struct kfd_ioctl_dbg_wave_control_args *args = data;
 	struct kfd_dev *dev;
+	struct kfd_process_device *pdd;
 	struct dbg_wave_control_info wac_info;
 	unsigned char *args_buff;
 	uint32_t computed_buff_size;
@@ -781,9 +803,15 @@ static int kfd_ioctl_dbg_wave_control(struct file *filep,
 				sizeof(wac_info.dbgWave_msg.MemoryVA) +
 				sizeof(wac_info.trapId);
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
+	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
+		mutex_unlock(&p->mutex);
+		pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
 		return -EINVAL;
+	}
+	dev = pdd->dev;
+	mutex_unlock(&p->mutex);
 
 	if (dev->device_info->asic_family == CHIP_CARRIZO) {
 		pr_debug("kfd_ioctl_dbg_wave_control not supported on CZ\n");
@@ -847,16 +875,19 @@ static int kfd_ioctl_get_clock_counters(struct file *filep,
 				struct kfd_process *p, void *data)
 {
 	struct kfd_ioctl_get_clock_counters_args *args = data;
-	struct kfd_dev *dev;
+	struct kfd_process_device *pdd;
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (dev)
+	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (pdd)
 		/* Reading GPU clock counter from KGD */
-		args->gpu_clock_counter = amdgpu_amdkfd_get_gpu_clock_counter(dev->kgd);
+		args->gpu_clock_counter = amdgpu_amdkfd_get_gpu_clock_counter(pdd->dev->kgd);
 	else
 		/* Node without GPU resource */
 		args->gpu_clock_counter = 0;
 
+	mutex_unlock(&p->mutex);
+
 	/* No access to rdtsc. Using raw monotonic time */
 	args->cpu_clock_counter = ktime_get_raw_ns();
 	args->system_clock_counter = ktime_get_boottime_ns();
@@ -1070,11 +1101,13 @@ static int kfd_ioctl_set_scratch_backing_va(struct file *filep,
 	struct kfd_dev *dev;
 	long err;
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
-		return -EINVAL;
-
 	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
+		err = -EINVAL;
+		goto bind_process_to_device_fail;
+	}
+	dev = pdd->dev;
 
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
@@ -1102,15 +1135,20 @@ static int kfd_ioctl_get_tile_config(struct file *filep,
 		struct kfd_process *p, void *data)
 {
 	struct kfd_ioctl_get_tile_config_args *args = data;
-	struct kfd_dev *dev;
+	struct kfd_process_device *pdd;
 	struct tile_config config;
 	int err = 0;
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
+	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
+		mutex_unlock(&p->mutex);
 		return -EINVAL;
+	}
 
-	amdgpu_amdkfd_get_tile_config(dev->kgd, &config);
+	amdgpu_amdkfd_get_tile_config(pdd->dev->kgd, &config);
+
+	mutex_unlock(&p->mutex);
 
 	args->gb_addr_config = config.gb_addr_config;
 	args->num_banks = config.num_banks;
@@ -1145,21 +1183,15 @@ static int kfd_ioctl_acquire_vm(struct file *filep, struct kfd_process *p,
 {
 	struct kfd_ioctl_acquire_vm_args *args = data;
 	struct kfd_process_device *pdd;
-	struct kfd_dev *dev;
 	struct file *drm_file;
 	int ret;
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
-		return -EINVAL;
-
 	drm_file = fget(args->drm_fd);
 	if (!drm_file)
 		return -EINVAL;
 
 	mutex_lock(&p->mutex);
-
-	pdd = kfd_get_process_device_data(dev, p);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
 	if (!pdd) {
 		ret = -EINVAL;
 		goto err_unlock;
@@ -1218,19 +1250,23 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
 	if (args->size == 0)
 		return -EINVAL;
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
-		return -EINVAL;
+	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
+		err = -EINVAL;
+		goto err_unlock;
+	}
+
+	dev = pdd->dev;
 
 	if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) &&
 		(flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) &&
 		!kfd_dev_is_large_bar(dev)) {
 		pr_err("Alloc host visible vram on small bar is not allowed\n");
-		return -EINVAL;
+		err = -EINVAL;
+		goto err_unlock;
 	}
 
-	mutex_lock(&p->mutex);
-
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
 		err = PTR_ERR(pdd);
@@ -1301,17 +1337,12 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
 	struct kfd_ioctl_free_memory_of_gpu_args *args = data;
 	struct kfd_process_device *pdd;
 	void *mem;
-	struct kfd_dev *dev;
 	int ret;
 	uint64_t size = 0;
 
-	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
-	if (!dev)
-		return -EINVAL;
-
 	mutex_lock(&p->mutex);
 
-	pdd = kfd_get_process_device_data(dev, p);
+	pdd = kfd_process_device_data_by_id(p, GET_GPU_ID(args->handle));
 	if (!pdd) {
 		pr_err("Process device data doesn't exist\n");
 		ret = -EINVAL;
@@ -1325,7 +1356,7 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
 		goto err_unlock;
 	}
 
-	ret = amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd,
+	ret = amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd,
 				(struct kgd_mem *)mem, pdd->drm_priv, &size);
 
 	/* If freeing the buffer failed, leave the handle in place for
@@ -1348,15 +1379,11 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 	struct kfd_ioctl_map_memory_to_gpu_args *args = data;
 	struct kfd_process_device *pdd, *peer_pdd;
 	void *mem;
-	struct kfd_dev *dev, *peer;
+	struct kfd_dev *dev;
 	long err = 0;
 	int i;
 	uint32_t *devices_arr = NULL;
 
-	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
-	if (!dev)
-		return -EINVAL;
-
 	if (!args->n_devices) {
 		pr_debug("Device IDs array empty\n");
 		return -EINVAL;
@@ -1380,6 +1407,12 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 	}
 
 	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, GET_GPU_ID(args->handle));
+	if (!pdd) {
+		err = -EINVAL;
+		goto get_process_device_data_failed;
+	}
+	dev = pdd->dev;
 
 	pdd = kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd)) {
@@ -1395,21 +1428,21 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 	}
 
 	for (i = args->n_success; i < args->n_devices; i++) {
-		peer = kfd_device_by_id(devices_arr[i]);
-		if (!peer) {
+		peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]);
+		if (!peer_pdd) {
 			pr_debug("Getting device by id failed for 0x%x\n",
 				 devices_arr[i]);
 			err = -EINVAL;
 			goto get_mem_obj_from_handle_failed;
 		}
 
-		peer_pdd = kfd_bind_process_to_device(peer, p);
+		peer_pdd = kfd_bind_process_to_device(peer_pdd->dev, p);
 		if (IS_ERR(peer_pdd)) {
 			err = PTR_ERR(peer_pdd);
 			goto get_mem_obj_from_handle_failed;
 		}
 		err = amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
-			peer->kgd, (struct kgd_mem *)mem, peer_pdd->drm_priv);
+			peer_pdd->dev->kgd, (struct kgd_mem *)mem, peer_pdd->drm_priv);
 		if (err) {
 			pr_err("Failed to map to gpu %d/%d\n",
 			       i, args->n_devices);
@@ -1428,12 +1461,10 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 
 	/* Flush TLBs after waiting for the page table updates to complete */
 	for (i = 0; i < args->n_devices; i++) {
-		peer = kfd_device_by_id(devices_arr[i]);
-		if (WARN_ON_ONCE(!peer))
-			continue;
-		peer_pdd = kfd_get_process_device_data(peer, p);
+		peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]);
 		if (WARN_ON_ONCE(!peer_pdd))
 			continue;
+
 		kfd_flush_tlb(peer_pdd);
 	}
 
@@ -1441,6 +1472,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
 
 	return err;
 
+get_process_device_data_failed:
 bind_process_to_device_failed:
 get_mem_obj_from_handle_failed:
 map_memory_to_gpu_failed:
@@ -1458,14 +1490,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
 	struct kfd_ioctl_unmap_memory_from_gpu_args *args = data;
 	struct kfd_process_device *pdd, *peer_pdd;
 	void *mem;
-	struct kfd_dev *dev, *peer;
 	long err = 0;
 	uint32_t *devices_arr = NULL, i;
 
-	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
-	if (!dev)
-		return -EINVAL;
-
 	if (!args->n_devices) {
 		pr_debug("Device IDs array empty\n");
 		return -EINVAL;
@@ -1489,8 +1516,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
 	}
 
 	mutex_lock(&p->mutex);
-
-	pdd = kfd_get_process_device_data(dev, p);
+	pdd = kfd_process_device_data_by_id(p, GET_GPU_ID(args->handle));
 	if (!pdd) {
 		err = -EINVAL;
 		goto bind_process_to_device_failed;
@@ -1504,19 +1530,13 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
 	}
 
 	for (i = args->n_success; i < args->n_devices; i++) {
-		peer = kfd_device_by_id(devices_arr[i]);
-		if (!peer) {
-			err = -EINVAL;
-			goto get_mem_obj_from_handle_failed;
-		}
-
-		peer_pdd = kfd_get_process_device_data(peer, p);
+		peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]);
 		if (!peer_pdd) {
-			err = -ENODEV;
+			err = -EINVAL;
 			goto get_mem_obj_from_handle_failed;
 		}
 		err = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
-			peer->kgd, (struct kgd_mem *)mem, peer_pdd->drm_priv);
+			peer_pdd->dev->kgd, (struct kgd_mem *)mem, peer_pdd->drm_priv);
 		if (err) {
 			pr_err("Failed to unmap from gpu %d/%d\n",
 			       i, args->n_devices);
@@ -1645,23 +1665,26 @@ static int kfd_ioctl_import_dmabuf(struct file *filep,
 	void *mem;
 	int r;
 
-	dev = kfd_device_by_id(args->gpu_id);
-	if (!dev)
-		return -EINVAL;
+	mutex_lock(&p->mutex);
+	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+	if (!pdd) {
+		r = -EINVAL;
+		goto err_unlock;
+	}
 
 	dmabuf = dma_buf_get(args->dmabuf_fd);
-	if (IS_ERR(dmabuf))
-		return PTR_ERR(dmabuf);
-
-	mutex_lock(&p->mutex);
+	if (IS_ERR(dmabuf)) {
+		r = PTR_ERR(dmabuf);
+		goto err_unlock;
+	}
 
-	pdd = kfd_bind_process_to_device(dev, p);
+	pdd = kfd_bind_process_to_device(pdd->dev, p);
 	if (IS_ERR(pdd)) {
 		r = PTR_ERR(pdd);
 		goto err_unlock;
 	}
 
-	r = amdgpu_amdkfd_gpuvm_import_dmabuf(dev->kgd, dmabuf,
+	r = amdgpu_amdkfd_gpuvm_import_dmabuf(pdd->dev->kgd, dmabuf,
 					      args->va_addr, pdd->drm_priv,
 					      (struct kgd_mem **)&mem, &size,
 					      NULL);
@@ -1695,13 +1718,19 @@ static int kfd_ioctl_smi_events(struct file *filep,
 				struct kfd_process *p, void *data)
 {
 	struct kfd_ioctl_smi_events_args *args = data;
-	struct kfd_dev *dev;
+	struct kfd_process_device *pdd;
 
-	dev = kfd_device_by_id(args->gpuid);
-	if (!dev)
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_process_device_data_by_id(p, args->gpuid);
+	if (!pdd) {
+		mutex_unlock(&p->mutex);
 		return -EINVAL;
+	}
 
-	return kfd_smi_event_open(dev, &args->anon_fd);
+	mutex_unlock(&p->mutex);
+
+	return kfd_smi_event_open(pdd->dev, &args->anon_fd);
 }
 
 static int kfd_ioctl_set_xnack_mode(struct file *filep,
@@ -1800,6 +1829,57 @@ static int criu_dump_process(struct kfd_process *p, struct kfd_ioctl_criu_dumper
 	return ret;
 }
 
+static int criu_dump_devices(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
+{
+	struct kfd_criu_device_bucket *device_buckets;
+	int ret = 0, i;
+
+	if (args->num_objects != p->n_pdds) {
+		pr_err("Mismatch with number of devices (current:%d user:%lld)\n",
+							p->n_pdds, args->num_objects);
+		return -EINVAL;
+	}
+
+	if (args->objects_size != args->num_objects *
+		(sizeof(*device_buckets) + sizeof(struct kfd_criu_device_priv_data))) {
+		pr_err("Invalid objects size for devices\n");
+		return -EINVAL;
+	}
+
+	device_buckets = kvzalloc(args->objects_size, GFP_KERNEL);
+	if (!device_buckets)
+		return -ENOMEM;
+
+	/* Private data for devices it not currently used. To set private data
+	 * struct kfd_criu_device_priv_data * device_privs = (struct kfd_criu_device_priv_data*)
+	 *				((uint8_t*)device_buckets +
+	 *				 (args->num_objects * (sizeof(*device_buckets))));
+	 */
+
+	for (i = 0; i < args->num_objects; i++) {
+		struct kfd_process_device *pdd = p->pdds[i];
+
+		device_buckets[i].user_gpu_id = pdd->user_gpu_id;
+		device_buckets[i].actual_gpu_id = pdd->dev->id;
+
+		/* priv_data does not contain useful information for now and is reserved for
+		 * future use, so we do not set its contents
+		 */
+		device_buckets[i].priv_data_offset = i * sizeof(struct kfd_criu_device_priv_data);
+		device_buckets[i].priv_data_size = sizeof(struct kfd_criu_device_priv_data);
+	}
+
+	ret = copy_to_user((void __user *)args->objects, device_buckets, args->objects_size);
+
+	if (ret) {
+		pr_err("Failed to copy device information to user\n");
+		ret = -EFAULT;
+	}
+
+	kvfree(device_buckets);
+	return ret;
+}
+
 uint64_t get_process_num_bos(struct kfd_process *p)
 {
 	uint64_t num_of_bos = 0, i;
@@ -2231,6 +2311,9 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
 	case KFD_CRIU_OBJECT_TYPE_PROCESS:
 		ret = criu_dump_process(p, args);
 		break;
+	case KFD_CRIU_OBJECT_TYPE_DEVICE:
+		ret = criu_dump_devices(p, args);
+		break;
 	case KFD_CRIU_OBJECT_TYPE_BO:
 		ret = criu_dump_bos(p, args);
 		break;
@@ -2240,7 +2323,6 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
 	case KFD_CRIU_OBJECT_TYPE_EVENT:
 		ret = criu_dump_events(p, args);
 		break;
-	case KFD_CRIU_OBJECT_TYPE_DEVICE:
 	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
 	default:
 		pr_err("Unsupported object type:%d\n", args->type);
@@ -2301,6 +2383,102 @@ static int criu_restore_process(struct kfd_process *p, struct kfd_ioctl_criu_res
 	return ret;
 }
 
+static int criu_restore_devices(struct kfd_process *p, struct kfd_ioctl_criu_restorer_args *args)
+{
+	int ret = 0, i;
+	uint8_t *objects;
+	struct kfd_criu_device_bucket *device_buckets;
+
+	if (args->num_objects != p->n_pdds)
+		return -EINVAL;
+
+	if (args->objects_size != args->num_objects *
+		(sizeof(*device_buckets) + sizeof(struct kfd_criu_device_priv_data))) {
+		pr_err("Invalid objects size for devices\n");
+		return -EINVAL;
+	}
+
+	objects = kmalloc(args->objects_size, GFP_KERNEL);
+	if (!objects)
+		return -ENOMEM;
+
+	ret = copy_from_user(objects, (void __user *)args->objects, args->objects_size);
+	if (ret) {
+		pr_err("Failed to copy devices information from user\n");
+		ret = -EFAULT;
+		goto exit;
+	}
+
+	device_buckets = (struct kfd_criu_device_bucket *) objects;
+
+	for (i = 0; i < args->num_objects; i++) {
+		struct kfd_dev *dev;
+		struct kfd_process_device *pdd;
+		struct file *drm_file;
+
+		/* device private data is not currently used. To access device private data:
+		 * uint8_t *private_datas = objects +
+		 *				(args->num_objects * sizeof(*device_buckets));
+		 *
+		 * struct kfd_criu_device_priv_data *device_priv =
+		 *			(struct kfd_criu_device_priv_data*)
+		 *			(private_datas + device_buckets[i].priv_data_offset);
+		 */
+
+		dev = kfd_device_by_id(device_buckets[i].actual_gpu_id);
+		if (!dev) {
+			pr_err("Failed to find device with gpu_id = %x\n",
+				device_buckets[i].actual_gpu_id);
+			ret = -EINVAL;
+			goto exit;
+		}
+
+		pdd = kfd_get_process_device_data(dev, p);
+		if (!pdd) {
+			pr_err("Failed to get pdd for gpu_id = %x\n",
+					device_buckets[i].actual_gpu_id);
+			ret = -EINVAL;
+			goto exit;
+		}
+		pdd->user_gpu_id = device_buckets[i].user_gpu_id;
+
+		drm_file = fget(device_buckets[i].drm_fd);
+		if (!drm_file) {
+			pr_err("Invalid render node file descriptor sent from plugin (%d)\n",
+				device_buckets[i].drm_fd);
+			ret = -EINVAL;
+			goto exit;
+		}
+
+		if (pdd->drm_file) {
+			ret = -EINVAL;
+			goto exit;
+		}
+
+		/* create the vm using render nodes for kfd pdd */
+		if (kfd_process_device_init_vm(pdd, drm_file)) {
+			pr_err("could not init vm for given pdd\n");
+			/* On success, the PDD keeps the drm_file reference */
+			fput(drm_file);
+			ret = -EINVAL;
+			goto exit;
+		}
+		/*
+		 * pdd now already has the vm bound to render node so below api won't create a new
+		 * exclusive kfd mapping but use existing one with renderDXXX but is still needed
+		 * for iommu v2 binding  and runtime pm.
+		 */
+		pdd = kfd_bind_process_to_device(dev, p);
+		if (IS_ERR(pdd)) {
+			ret = PTR_ERR(pdd);
+			goto exit;
+		}
+	}
+exit:
+	kvfree(objects);
+	return ret;
+}
+
 static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restorer_args *args)
 {
 	uint8_t *objects, *private_data;
@@ -2719,6 +2897,9 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
 	case KFD_CRIU_OBJECT_TYPE_PROCESS:
 		ret = criu_restore_process(p, args);
 		break;
+	case KFD_CRIU_OBJECT_TYPE_DEVICE:
+		ret = criu_restore_devices(p, args);
+		break;
 	case KFD_CRIU_OBJECT_TYPE_BO:
 		ret = criu_restore_bos(p, args);
 		break;
@@ -2728,7 +2909,6 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
 	case KFD_CRIU_OBJECT_TYPE_EVENT:
 		ret = criu_restore_events(filep, p, args);
 		break;
-	case KFD_CRIU_OBJECT_TYPE_DEVICE:
 	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
 	default:
 		pr_err("Unsupported object type:%d\n", args->type);
@@ -2819,6 +2999,11 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
 
 	args->process_priv_data_size = sizeof(struct kfd_criu_process_priv_data);
 
+	args->total_devices = p->n_pdds;
+	/* devices_priv_data_size does not contain any useful information for now */
+	args->devices_priv_data_size = args->total_devices *
+					sizeof(struct kfd_criu_device_priv_data);
+
 	args->total_bos = get_process_num_bos(p);
 	args->bos_priv_data_size = args->total_bos * sizeof(struct kfd_criu_bo_priv_data);
 
@@ -2832,7 +3017,8 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
 	args->total_events = kfd_get_num_events(p);
 	args->events_priv_data_size = args->total_events * sizeof(struct kfd_criu_event_priv_data);
 
-	dev_dbg(kfd_device, "Num of bos:%llu queues:%u events:%u\n",
+	dev_dbg(kfd_device, "Num of devices:%u bos:%llu queues:%u events:%u\n",
+				args->total_devices,
 				args->total_bos,
 				args->total_queues,
 				args->total_events);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 18362478e351..5e9067b70908 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -343,11 +343,12 @@ int kfd_kmap_event_page(struct kfd_process *p, uint64_t event_page_offset)
 		return -EINVAL;
 	}
 
-	kfd = kfd_device_by_id(GET_GPU_ID(event_page_offset));
-	if (!kfd) {
+	pdd = kfd_process_device_data_by_id(p, GET_GPU_ID(event_page_offset));
+	if (!pdd) {
 		pr_err("Getting device by id failed in %s\n", __func__);
 		return -EINVAL;
 	}
+	kfd = pdd->dev;
 
 	pdd = kfd_bind_process_to_device(kfd, p);
 	if (IS_ERR(pdd))
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index bf10a5305ef7..1912df8d9101 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -759,6 +759,13 @@ struct kfd_process_device {
 	 *  number of CU's a device has along with number of other competing processes
 	 */
 	struct attribute attr_cu_occupancy;
+
+	/*
+	 * If this process has been checkpointed before, then the user
+	 * application will use the original gpu_id on the
+	 * checkpointed node to refer to this device.
+	 */
+	uint32_t user_gpu_id;
 };
 
 #define qpd_to_pdd(x) container_of(x, struct kfd_process_device, qpd)
@@ -914,6 +921,9 @@ int kfd_process_restore_queues(struct kfd_process *p);
 void kfd_suspend_all_processes(void);
 int kfd_resume_all_processes(void);
 
+struct kfd_process_device *kfd_process_device_data_by_id(struct kfd_process *process,
+				uint32_t gpu_id);
+
 int kfd_process_device_init_vm(struct kfd_process_device *pdd,
 			       struct file *drm_file);
 struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index e4cb2f778590..a23f2162eb8b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1425,6 +1425,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	pdd->runtime_inuse = false;
 	pdd->vram_usage = 0;
 	pdd->sdma_past_activity_counter = 0;
+	pdd->user_gpu_id = dev->id;
 	atomic64_set(&pdd->evict_duration_counter, 0);
 	p->pdds[p->n_pdds++] = pdd;
 
@@ -1898,6 +1899,23 @@ void kfd_flush_tlb(struct kfd_process_device *pdd)
 	}
 }
 
+struct kfd_process_device *kfd_process_device_data_by_id(struct kfd_process *p, uint32_t gpu_id)
+{
+	int i;
+
+	if (gpu_id) {
+		for (i = 0; i < p->n_pdds; i++) {
+			struct kfd_process_device *pdd = p->pdds[i];
+
+			if (pdd->user_gpu_id == gpu_id)
+				return pdd;
+		}
+
+		WARN_ONCE(1, "Failed to find mapping for gpu = 0x%x\n",  gpu_id);
+	}
+	return NULL;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 int kfd_debugfs_mqds_by_process(struct seq_file *m, void *data)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 17/18] Revert "drm/amdgpu: Remove verify_access shortcut for KFD BOs"
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (15 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 16/18] drm/amdkfd: CRIU implement gpu_id remapping David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  2021-08-19 13:37 ` [PATCH 18/18] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects David Yat Sin
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

This reverts commit 12ebe2b9df192a2a8580cd9ee3e9940c116913c8.

This is just a temporary work around and will be dropped later.

Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 99ea29fd12bd..be7eb85af066 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -178,6 +178,13 @@ static int amdgpu_verify_access(struct ttm_buffer_object *bo, struct file *filp)
 {
 	struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
 
+	/*
+	 * Don't verify access for KFD BOs. They don't have a GEM
+	 * object associated with them.
+	 */
+	if (abo->kfd_bo)
+		return 0;
+
 	if (amdgpu_ttm_tt_get_usermm(bo->ttm))
 		return -EPERM;
 	return drm_vma_node_verify_access(&abo->tbo.base.vma_node,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 18/18] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects
  2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
                   ` (16 preceding siblings ...)
  2021-08-19 13:37 ` [PATCH 17/18] Revert "drm/amdgpu: Remove verify_access shortcut for KFD BOs" David Yat Sin
@ 2021-08-19 13:37 ` David Yat Sin
  17 siblings, 0 replies; 25+ messages in thread
From: David Yat Sin @ 2021-08-19 13:37 UTC (permalink / raw)
  To: amd-gfx; +Cc: felix.kuehling, rajneesh.bhardwaj, David Yat Sin

From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>

KFD buffer objects do not associate a GEM handle with them so cannot
directly be used with libdrm to initiate a system dma (sDMA) operation
to speedup the checkpoint and restore operation so export them as dmabuf
objects and use with libdrm helper (amdgpu_bo_import) to further process
the sdma command submissions.

With sDMA, we see huge improvement in checkpoint and restore operations
compared to the generic pci based access via host data path.

Suggested-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 57 ++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 90e4d4ce4398..ead4cb37377b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -35,6 +35,7 @@
 #include <linux/mman.h>
 #include <linux/ptrace.h>
 #include <linux/dma-buf.h>
+#include <linux/fdtable.h>
 #include <asm/processor.h>
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
@@ -43,6 +44,7 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_object.h"
+#include "amdgpu_dma_buf.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
@@ -1900,6 +1902,33 @@ uint64_t get_process_num_bos(struct kfd_process *p)
 	return num_of_bos;
 }
 
+static int criu_get_prime_handle(struct drm_gem_object *gobj, int flags,
+				      u32 *shared_fd)
+{
+	struct dma_buf *dmabuf;
+	int ret;
+
+	dmabuf = amdgpu_gem_prime_export(gobj, flags);
+	if (IS_ERR(dmabuf)) {
+		ret = PTR_ERR(dmabuf);
+		pr_err("dmabuf export failed for the BO\n");
+		return ret;
+	}
+
+	ret = dma_buf_fd(dmabuf, flags);
+	if (ret < 0) {
+		pr_err("dmabuf create fd failed, ret:%d\n", ret);
+		goto out_free_dmabuf;
+	}
+
+	*shared_fd = ret;
+	return 0;
+
+out_free_dmabuf:
+	dma_buf_put(dmabuf);
+	return ret;
+}
+
 static int criu_dump_bos(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
 {
 	struct kfd_criu_bo_bucket *bo_buckets;
@@ -1969,6 +1998,14 @@ static int criu_dump_bos(struct kfd_process *p, struct kfd_ioctl_criu_dumper_arg
 					goto exit;
 				}
 			}
+			if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+				ret = criu_get_prime_handle(&dumper_bo->tbo.base,
+						bo_bucket->alloc_flags &
+						KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE ? DRM_RDWR : 0,
+						&bo_bucket->dmabuf_fd);
+				if (ret)
+					goto exit;
+			}
 			if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL)
 				bo_bucket->offset = KFD_MMAP_TYPE_DOORBELL |
 					KFD_MMAP_GPU_ID(pdd->dev->id);
@@ -1998,6 +2035,11 @@ static int criu_dump_bos(struct kfd_process *p, struct kfd_ioctl_criu_dumper_arg
 	}
 
 exit:
+	while (ret && bo_index--) {
+		if (bo_buckets[bo_index].alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+			close_fd(bo_buckets[bo_index].dmabuf_fd);
+	}
+
 	kvfree(bo_buckets);
 	return ret;
 }
@@ -2516,6 +2558,7 @@ static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restore
 		struct kfd_criu_bo_priv_data *bo_priv;
 		struct kfd_dev *dev;
 		struct kfd_process_device *pdd;
+		struct kgd_mem *kgd_mem;
 		void *mem;
 		u64 offset;
 		int idr_handle;
@@ -2663,6 +2706,16 @@ static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restore
 		}
 
 		pr_debug("map memory was successful for the BO\n");
+		/* create the dmabuf object and export the bo */
+		kgd_mem = (struct kgd_mem *)mem;
+		if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) {
+			ret = criu_get_prime_handle(&kgd_mem->bo->tbo.base,
+						    DRM_RDWR,
+						    &bo_bucket->dmabuf_fd);
+			if (ret)
+				goto exit;
+		}
+
 	} /* done */
 
 	/* Flush TLBs after waiting for the page table updates to complete */
@@ -2687,6 +2740,10 @@ static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restore
 		ret = -EFAULT;
 
 exit:
+	while (ret && i--) {
+		if (bo_buckets[i].alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM)
+			close_fd(bo_buckets[i].dmabuf_fd);
+	}
 	kvfree(objects);
 	return ret;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 09/18] drm/amdkfd: CRIU add queues support
  2021-08-19 13:37 ` [PATCH 09/18] drm/amdkfd: CRIU add queues support David Yat Sin
@ 2021-08-23 18:29   ` Felix Kuehling
  0 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2021-08-23 18:29 UTC (permalink / raw)
  To: David Yat Sin, amd-gfx; +Cc: rajneesh.bhardwaj

kfd_chardev.c contains the ioctl API, but not the whole implementation
of everything. I think it would make sense to move the criu_dump_queue*
functions into kfd_process_queue_manager.c.

Regards,
  Felix


Am 2021-08-19 um 9:37 a.m. schrieb David Yat Sin:
> Add support to existing CRIU ioctl's to save number of queues and queue
> properties for each queue during checkpoint and re-create queues on
> restore.
>
> Signed-off-by: David Yat Sin <david.yatsin@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 380 ++++++++++++++++++++++-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  22 +-
>  2 files changed, 400 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 24e5c53261f5..6f1c9fb8d46c 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1965,6 +1965,213 @@ static int criu_dump_bos(struct kfd_process *p, struct kfd_ioctl_criu_dumper_arg
>  	return ret;
>  }
>  
> +static void get_queue_data_sizes(struct kfd_process_device *pdd,
> +				struct queue *q,
> +				uint32_t *cu_mask_size)
> +{
> +	*cu_mask_size = sizeof(uint32_t) * (q->properties.cu_mask_count / 32);
> +}
> +
> +int get_process_queue_info(struct kfd_process *p, uint32_t *num_queues, uint32_t *q_data_sizes)
> +{
> +	u32 data_sizes = 0;
> +	u32 q_index = 0;
> +	struct queue *q;
> +	int i;
> +
> +	/* Run over all PDDs of the process */
> +	for (i = 0; i < p->n_pdds; i++) {
> +		struct kfd_process_device *pdd = p->pdds[i];
> +
> +		list_for_each_entry(q, &pdd->qpd.queues_list, list) {
> +			if (q->properties.type == KFD_QUEUE_TYPE_COMPUTE ||
> +				q->properties.type == KFD_QUEUE_TYPE_SDMA ||
> +				q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI) {
> +				u32 cu_mask_size;
> +
> +				get_queue_data_sizes(pdd, q, &cu_mask_size);
> +
> +				data_sizes += cu_mask_size;
> +				q_index++;
> +			} else {
> +				pr_err("Unsupported queue type (%d)\n", q->properties.type);
> +				return -EOPNOTSUPP;
> +			}
> +		}
> +	}
> +	*num_queues = q_index;
> +	*q_data_sizes = data_sizes;
> +
> +	return 0;
> +}
> +
> +static void criu_dump_queue(struct kfd_process_device *pdd,
> +			   struct queue *q,
> +			   struct kfd_criu_queue_bucket *q_bucket,
> +			   void *private_data)
> +{
> +	struct kfd_criu_queue_priv_data *q_data = (struct kfd_criu_queue_priv_data *) private_data;
> +	uint8_t *cu_mask;
> +
> +	cu_mask = (void *)(q_data + 1);
> +
> +	q_bucket->gpu_id = pdd->dev->id;
> +	q_data->type = q->properties.type;
> +	q_data->format = q->properties.format;
> +	q_data->q_id =  q->properties.queue_id;
> +	q_data->q_address = q->properties.queue_address;
> +	q_data->q_size = q->properties.queue_size;
> +	q_data->priority = q->properties.priority;
> +	q_data->q_percent = q->properties.queue_percent;
> +	q_data->read_ptr_addr = (uint64_t)q->properties.read_ptr;
> +	q_data->write_ptr_addr = (uint64_t)q->properties.write_ptr;
> +	q_data->doorbell_id = q->doorbell_id;
> +
> +	q_data->sdma_id = q->sdma_id;
> +
> +	q_data->eop_ring_buffer_address =
> +		q->properties.eop_ring_buffer_address;
> +
> +	q_data->eop_ring_buffer_size = q->properties.eop_ring_buffer_size;
> +
> +	q_data->ctx_save_restore_area_address =
> +		q->properties.ctx_save_restore_area_address;
> +
> +	q_data->ctx_save_restore_area_size =
> +		q->properties.ctx_save_restore_area_size;
> +
> +	if (q_data->cu_mask_size)
> +		memcpy(cu_mask, q->properties.cu_mask, q_data->cu_mask_size);
> +
> +	pr_debug("Dumping Queue: gpu_id:%x queue_id:%u\n", q_bucket->gpu_id, q_data->q_id);
> +}
> +
> +static int criu_dump_queues_device(struct kfd_process_device *pdd,
> +				unsigned int *q_index,
> +				unsigned int max_num_queues,
> +				struct kfd_criu_queue_bucket *q_buckets,
> +				uint8_t *user_priv_data,
> +				uint64_t *queues_priv_data_offset)
> +{
> +	struct queue *q;
> +	uint8_t *q_private_data = NULL; /* Local buffer to store individual queue private data */
> +	unsigned int q_private_data_size = 0;
> +	int ret = 0;
> +
> +	list_for_each_entry(q, &pdd->qpd.queues_list, list) {
> +		struct kfd_criu_queue_bucket q_bucket;
> +		struct kfd_criu_queue_priv_data *q_data;
> +		uint64_t q_data_size;
> +		uint32_t cu_mask_size;
> +
> +		if (q->properties.type != KFD_QUEUE_TYPE_COMPUTE &&
> +			q->properties.type != KFD_QUEUE_TYPE_SDMA &&
> +			q->properties.type != KFD_QUEUE_TYPE_SDMA_XGMI) {
> +
> +			pr_err("Unsupported queue type (%d)\n", q->properties.type);
> +			return -EOPNOTSUPP;
> +		}
> +
> +		memset(&q_bucket, 0, sizeof(q_bucket));
> +
> +		get_queue_data_sizes(pdd, q, &cu_mask_size);
> +
> +		q_data_size = sizeof(*q_data) + cu_mask_size;
> +
> +		/* Increase local buffer space if needed */
> +		if (q_private_data_size < q_data_size) {
> +			kfree(q_private_data);
> +
> +			q_private_data = kzalloc(q_data_size, GFP_KERNEL);
> +			if (!q_private_data) {
> +				ret = -ENOMEM;
> +				break;
> +			}
> +			q_private_data_size = q_data_size;
> +		}
> +
> +		q_data = (struct kfd_criu_queue_priv_data *)q_private_data;
> +
> +		q_data->cu_mask_size = cu_mask_size;
> +
> +		criu_dump_queue(pdd, q, &q_bucket, q_data);
> +
> +		q_bucket.priv_data_offset = *queues_priv_data_offset;
> +		q_bucket.priv_data_size = q_data_size;
> +
> +		ret = copy_to_user((void __user *) (user_priv_data + q_bucket.priv_data_offset),
> +				q_private_data, q_bucket.priv_data_size);
> +		if (ret) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +		*queues_priv_data_offset += q_data_size;
> +
> +		ret = copy_to_user((void __user *)&q_buckets[*q_index],
> +					&q_bucket, sizeof(q_bucket));
> +		if (ret) {
> +			pr_err("Failed to copy queue information to user\n");
> +			ret = -EFAULT;
> +			break;
> +		}
> +		*q_index = *q_index + 1;
> +	}
> +
> +	kfree(q_private_data);
> +
> +	return ret;
> +}
> +
> +static int criu_dump_queues(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
> +{
> +	struct kfd_criu_queue_bucket *queue_buckets;
> +	uint32_t num_queues, queue_extra_data_sizes;
> +	uint64_t queues_priv_data_offset = 0;
> +	int ret = 0, pdd_index, q_index = 0;
> +	void *private_data; /* Pointer to first private data in userspace */
> +
> +	ret = get_process_queue_info(p, &num_queues, &queue_extra_data_sizes);
> +	if (ret)
> +		return ret;
> +
> +	if (args->num_objects != num_queues) {
> +		pr_err("Mismatch with number of queues (current:%d user:%lld)\n",
> +							num_queues, args->num_objects);
> +		return -EINVAL;
> +	}
> +
> +	if (args->objects_size != queue_extra_data_sizes +
> +				  (num_queues * (sizeof(*queue_buckets) +
> +						 sizeof(struct kfd_criu_queue_priv_data)))) {
> +		pr_err("Invalid objects size for queues\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Queue private data size for each queue can vary in size as it also includes cu_mask, mqd
> +	 * and ctl_stack. First queue private data starts after all queue_buckets
> +	 */
> +
> +	queue_buckets = (struct kfd_criu_queue_bucket *)args->objects;
> +	private_data = (void *)(queue_buckets + args->num_objects);
> +
> +	for (pdd_index = 0; pdd_index < p->n_pdds; pdd_index++) {
> +		struct kfd_process_device *pdd = p->pdds[pdd_index];
> +
> +		/* criu_dump_queues_device will copy data to user */
> +		ret = criu_dump_queues_device(pdd,
> +					      &q_index,
> +					      args->num_objects,
> +					      queue_buckets,
> +					      private_data,
> +					      &queues_priv_data_offset);
> +
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
>  static int kfd_ioctl_criu_dumper(struct file *filep,
>  				struct kfd_process *p, void *data)
>  {
> @@ -2000,6 +2207,8 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
>  		ret = criu_dump_bos(p, args);
>  		break;
>  	case KFD_CRIU_OBJECT_TYPE_QUEUE:
> +		ret = criu_dump_queues(p, args);
> +		break;
>  	case KFD_CRIU_OBJECT_TYPE_EVENT:
>  	case KFD_CRIU_OBJECT_TYPE_DEVICE:
>  	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
> @@ -2274,6 +2483,163 @@ static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restore
>  	return ret;
>  }
>  
> +static int set_queue_properties_from_criu(struct queue_properties *qp,
> +					  struct kfd_criu_queue_bucket *q_bucket,
> +					  struct kfd_criu_queue_priv_data *q_data,
> +					  void *cu_mask)
> +{
> +	qp->is_interop = false;
> +	qp->is_gws = q_data->is_gws;
> +	qp->queue_percent = q_data->q_percent;
> +	qp->priority = q_data->priority;
> +	qp->queue_address = q_data->q_address;
> +	qp->queue_size = q_data->q_size;
> +	qp->read_ptr = (uint32_t *) q_data->read_ptr_addr;
> +	qp->write_ptr = (uint32_t *) q_data->write_ptr_addr;
> +	qp->eop_ring_buffer_address = q_data->eop_ring_buffer_address;
> +	qp->eop_ring_buffer_size = q_data->eop_ring_buffer_size;
> +	qp->ctx_save_restore_area_address = q_data->ctx_save_restore_area_address;
> +	qp->ctx_save_restore_area_size = q_data->ctx_save_restore_area_size;
> +	qp->ctl_stack_size = q_data->ctl_stack_size;
> +	qp->type = q_data->type;
> +	qp->format = q_data->format;
> +
> +	if (q_data->cu_mask_size) {
> +		qp->cu_mask = kzalloc(q_data->cu_mask_size, GFP_KERNEL);
> +		if (!qp->cu_mask)
> +			return -ENOMEM;
> +
> +		/* CU mask is stored after q_data */
> +		memcpy(qp->cu_mask, cu_mask, q_data->cu_mask_size);
> +		qp->cu_mask_count = (q_data->cu_mask_size / sizeof(uint32_t)) * 32;
> +	}
> +
> +	return 0;
> +}
> +
> +static int criu_restore_queue(struct kfd_process *p,
> +			      struct kfd_dev *dev,
> +			      struct kfd_process_device *pdd,
> +			      struct kfd_criu_queue_bucket *q_bucket,
> +			      void *private_data)
> +{
> +	struct kfd_criu_queue_priv_data *q_data = (struct kfd_criu_queue_priv_data *) private_data;
> +	uint8_t *cu_mask, *mqd, *ctl_stack;
> +	struct queue_properties qp;
> +	unsigned int queue_id;
> +	int ret = 0;
> +
> +	pr_debug("Restoring Queue: gpu_id:%x queue_id:%u\n", q_bucket->gpu_id, q_data->q_id);
> +
> +	/* data stored in this order: cu_mask, mqd, ctl_stack */
> +	cu_mask = (void *)(q_data + 1);
> +	mqd = cu_mask + q_data->cu_mask_size;
> +	ctl_stack = mqd + q_data->mqd_size;
> +
> +	memset(&qp, 0, sizeof(qp));
> +	ret = set_queue_properties_from_criu(&qp, q_bucket, q_data, cu_mask);
> +	if (ret)
> +		goto err_create_queue;
> +
> +	print_queue_properties(&qp);
> +
> +	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, NULL);
> +	if (ret) {
> +		pr_err("Failed to create new queue err:%d\n", ret);
> +		ret = -EINVAL;
> +		goto err_create_queue;
> +	}
> +
> +	pr_debug("Queue id %d was restored successfully\n", queue_id);
> +
> +	return 0;
> +err_create_queue:
> +	kfree(qp.cu_mask);
> +
> +	return ret;
> +}
> +
> +static int criu_restore_queues(struct kfd_process *p,
> +			       struct kfd_ioctl_criu_restorer_args *args)
> +{
> +	int ret = 0, i;
> +	struct kfd_criu_queue_bucket *user_buckets;
> +	uint8_t *all_private_data; /* Pointer to first private data in userspace */
> +	uint8_t *q_private_data = NULL; /* Local buffer for individual queue private data */
> +	unsigned int q_private_data_size = 0;
> +
> +	user_buckets = (struct kfd_criu_queue_bucket *)args->objects;
> +	all_private_data = (void *)(user_buckets + args->num_objects);
> +
> +	/*
> +	 * This process will not have any queues at this point, but we are
> +	 * setting all the dqm's for this process to evicted state.
> +	 */
> +	kfd_process_evict_queues(p);
> +
> +	for (i = 0; i < args->num_objects; i++) {
> +		struct kfd_process_device *pdd;
> +		struct kfd_dev *dev;
> +		struct kfd_criu_queue_bucket q_bucket;
> +
> +		ret = copy_from_user(&q_bucket, (void __user *)&user_buckets[i],
> +				sizeof(struct kfd_criu_queue_bucket));
> +
> +		if (ret) {
> +			ret = -EFAULT;
> +			goto exit;
> +		}
> +
> +		/* Increase local buffer space if needed */
> +		if (q_bucket.priv_data_size > q_private_data_size) {
> +			kfree(q_private_data);
> +
> +			q_private_data = kmalloc(q_bucket.priv_data_size, GFP_KERNEL);
> +			if (!q_private_data) {
> +				ret = -ENOMEM;
> +				goto exit;
> +			}
> +			q_private_data_size = q_bucket.priv_data_size;
> +		}
> +
> +		ret = copy_from_user(q_private_data,
> +				(void __user *) (all_private_data + q_bucket.priv_data_offset),
> +				q_bucket.priv_data_size);
> +		if (ret) {
> +			ret = -EFAULT;
> +			goto exit;
> +		}
> +
> +		dev = kfd_device_by_id(q_bucket.gpu_id);
> +		if (!dev) {
> +			pr_err("Could not get kfd_dev from gpu_id = 0x%x\n",
> +			q_bucket.gpu_id);
> +
> +			ret = -EINVAL;
> +			goto exit;
> +		}
> +
> +		pdd = kfd_get_process_device_data(dev, p);
> +		if (!pdd) {
> +			pr_err("Failed to get pdd\n");
> +			ret = -EFAULT;
> +			return ret;
> +		}
> +
> +		ret = criu_restore_queue(p, dev, pdd, &q_bucket, q_private_data);
> +		if (ret) {
> +			pr_err("Failed to restore queue (%d)\n", ret);
> +			goto exit;
> +		}
> +
> +	}
> +
> +exit:
> +	kfree(q_private_data);
> +
> +	return ret;
> +}
> +
>  static int kfd_ioctl_criu_restorer(struct file *filep,
>  				struct kfd_process *p, void *data)
>  {
> @@ -2293,6 +2659,8 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
>  		ret = criu_restore_bos(p, args);
>  		break;
>  	case KFD_CRIU_OBJECT_TYPE_QUEUE:
> +		ret = criu_restore_queues(p, args);
> +		break;
>  	case KFD_CRIU_OBJECT_TYPE_EVENT:
>  	case KFD_CRIU_OBJECT_TYPE_DEVICE:
>  	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
> @@ -2368,6 +2736,7 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
>  				struct kfd_process *p, void *data)
>  {
>  	struct kfd_ioctl_criu_process_info_args *args = data;
> +	uint32_t queues_extra_data_size;
>  	int ret = 0;
>  
>  	pr_debug("Inside %s\n", __func__);
> @@ -2387,7 +2756,16 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
>  	args->total_bos = get_process_num_bos(p);
>  	args->bos_priv_data_size = args->total_bos * sizeof(struct kfd_criu_bo_priv_data);
>  
> -	dev_dbg(kfd_device, "Num of bos:%llu\n", args->total_bos);
> +	ret = get_process_queue_info(p, &args->total_queues, &queues_extra_data_size);
> +	if (ret)
> +		goto err_unlock;
> +
> +	args->queues_priv_data_size = queues_extra_data_size +
> +				(args->total_queues * sizeof(struct kfd_criu_queue_priv_data));
> +
> +	dev_dbg(kfd_device, "Num of bos:%llu queues:%u\n",
> +				args->total_bos,
> +				args->total_queues);
>  err_unlock:
>  	mutex_unlock(&p->mutex);
>  	return ret;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 0b8165729cde..4b4808b191f2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1044,7 +1044,27 @@ struct kfd_criu_svm_range_priv_data {
>  };
>  
>  struct kfd_criu_queue_priv_data {
> -	uint64_t reserved;
> +	uint64_t q_address;
> +	uint64_t q_size;
> +	uint64_t read_ptr_addr;
> +	uint64_t write_ptr_addr;
> +	uint64_t doorbell_off;
> +	uint64_t eop_ring_buffer_address;
> +	uint64_t ctx_save_restore_area_address;
> +	uint32_t gpu_id;
> +	uint32_t type;
> +	uint32_t format;
> +	uint32_t q_id;
> +	uint32_t priority;
> +	uint32_t q_percent;
> +	uint32_t doorbell_id;
> +	uint32_t is_gws;
> +	uint32_t sdma_id;
> +	uint32_t eop_ring_buffer_size;
> +	uint32_t ctx_save_restore_area_size;
> +	uint32_t ctl_stack_size;
> +	uint32_t cu_mask_size;
> +	uint32_t mqd_size;
>  };
>  
>  struct kfd_criu_event_priv_data {

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/18] drm/amdkfd: CRIU restore queue ids
  2021-08-19 13:37 ` [PATCH 10/18] drm/amdkfd: CRIU restore queue ids David Yat Sin
@ 2021-08-23 18:29   ` Felix Kuehling
  0 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2021-08-23 18:29 UTC (permalink / raw)
  To: David Yat Sin, amd-gfx; +Cc: rajneesh.bhardwaj

Am 2021-08-19 um 9:37 a.m. schrieb David Yat Sin:
> When re-creating queues during CRIU restore, restore the queue with the
> same queue id value used during CRIU dump. Adding a new private
> structure queue_restore_data to store queue restore information.

The sentence about the queue_restore_data structure is outdated.

Regards,
  Felix


>
> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
> Signed-off-by: David Yat Sin <david.yatsin@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c      |  4 ++--
>  drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c       |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h         |  2 ++
>  .../amd/amdkfd/kfd_process_queue_manager.c    | 22 ++++++++++++++++++-
>  4 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 6f1c9fb8d46c..813ed42e3ce6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -312,7 +312,7 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>  			p->pasid,
>  			dev->id);
>  
> -	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id,
> +	err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, &queue_id, NULL,
>  			&doorbell_offset_in_process);
>  	if (err != 0)
>  		goto err_create_queue;
> @@ -2543,7 +2543,7 @@ static int criu_restore_queue(struct kfd_process *p,
>  
>  	print_queue_properties(&qp);
>  
> -	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, NULL);
> +	ret = pqm_create_queue(&p->pqm, dev, NULL, &qp, &queue_id, q_data, NULL);
>  	if (ret) {
>  		pr_err("Failed to create new queue err:%d\n", ret);
>  		ret = -EINVAL;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> index 159add0f5aaa..749a7a3bf191 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c
> @@ -185,7 +185,7 @@ static int dbgdev_register_diq(struct kfd_dbgdev *dbgdev)
>  	properties.type = KFD_QUEUE_TYPE_DIQ;
>  
>  	status = pqm_create_queue(dbgdev->pqm, dbgdev->dev, NULL,
> -				&properties, &qid, NULL);
> +				&properties, &qid, NULL, NULL);
>  
>  	if (status) {
>  		pr_err("Failed to create DIQ\n");
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 4b4808b191f2..eaf5fe1480e9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -468,6 +468,7 @@ enum KFD_QUEUE_PRIORITY {
>   * it's user mode or kernel mode queue.
>   *
>   */
> +
>  struct queue_properties {
>  	enum kfd_queue_type type;
>  	enum kfd_queue_format format;
> @@ -1114,6 +1115,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
>  			    struct file *f,
>  			    struct queue_properties *properties,
>  			    unsigned int *qid,
> +			    const struct kfd_criu_queue_priv_data *q_data,
>  			    uint32_t *p_doorbell_offset_in_process);
>  int pqm_destroy_queue(struct process_queue_manager *pqm, unsigned int qid);
>  int pqm_update_queue(struct process_queue_manager *pqm, unsigned int qid,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 95a6c36cea4c..e6abab16b8de 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -42,6 +42,20 @@ static inline struct process_queue_node *get_queue_by_qid(
>  	return NULL;
>  }
>  
> +static int assign_queue_slot_by_qid(struct process_queue_manager *pqm,
> +				    unsigned int qid)
> +{
> +	if (qid >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
> +		return -EINVAL;
> +
> +	if (__test_and_set_bit(qid, pqm->queue_slot_bitmap)) {
> +		pr_err("Cannot create new queue because requested qid(%u) is in use\n", qid);
> +		return -ENOSPC;
> +	}
> +
> +	return 0;
> +}
> +
>  static int find_available_queue_slot(struct process_queue_manager *pqm,
>  					unsigned int *qid)
>  {
> @@ -193,6 +207,7 @@ int pqm_create_queue(struct process_queue_manager *pqm,
>  			    struct file *f,
>  			    struct queue_properties *properties,
>  			    unsigned int *qid,
> +			    const struct kfd_criu_queue_priv_data *q_data,
>  			    uint32_t *p_doorbell_offset_in_process)
>  {
>  	int retval;
> @@ -224,7 +239,12 @@ int pqm_create_queue(struct process_queue_manager *pqm,
>  	if (pdd->qpd.queue_count >= max_queues)
>  		return -ENOSPC;
>  
> -	retval = find_available_queue_slot(pqm, qid);
> +	if (q_data) {
> +		retval = assign_queue_slot_by_qid(pqm, q_data->q_id);
> +		*qid = q_data->q_id;
> +	} else
> +		retval = find_available_queue_slot(pqm, qid);
> +
>  	if (retval != 0)
>  		return retval;
>  

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 15/18] drm/amdkfd: CRIU dump and restore events
  2021-08-19 13:37 ` [PATCH 15/18] drm/amdkfd: CRIU dump and restore events David Yat Sin
@ 2021-08-23 18:39   ` Felix Kuehling
  0 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2021-08-23 18:39 UTC (permalink / raw)
  To: David Yat Sin, amd-gfx; +Cc: rajneesh.bhardwaj


Am 2021-08-19 um 9:37 a.m. schrieb David Yat Sin:
> Add support to existing CRIU ioctl's to save and restore events during
> criu checkpoint and restore.
>
> Signed-off-by: David Yat Sin <david.yatsin@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 130 +++++++-----
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c  | 253 ++++++++++++++++++++---
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  25 ++-
>  3 files changed, 329 insertions(+), 79 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 19f16e3dd769..c8f523d8ab81 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -1008,51 +1008,11 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
>  	 * through the event_page_offset field.
>  	 */
>  	if (args->event_page_offset) {
> -		struct kfd_dev *kfd;
> -		struct kfd_process_device *pdd;
> -		void *mem, *kern_addr;
> -		uint64_t size;
> -
> -		if (p->signal_page) {
> -			pr_err("Event page is already set\n");
> -			return -EINVAL;
> -		}
> -
> -		kfd = kfd_device_by_id(GET_GPU_ID(args->event_page_offset));
> -		if (!kfd) {
> -			pr_err("Getting device by id failed in %s\n", __func__);
> -			return -EINVAL;
> -		}
> -
>  		mutex_lock(&p->mutex);
> -		pdd = kfd_bind_process_to_device(kfd, p);
> -		if (IS_ERR(pdd)) {
> -			err = PTR_ERR(pdd);
> -			goto out_unlock;
> -		}
> -
> -		mem = kfd_process_device_translate_handle(pdd,
> -				GET_IDR_HANDLE(args->event_page_offset));
> -		if (!mem) {
> -			pr_err("Can't find BO, offset is 0x%llx\n",
> -			       args->event_page_offset);
> -			err = -EINVAL;
> -			goto out_unlock;
> -		}
> +		err = kfd_kmap_event_page(p, args->event_page_offset);
>  		mutex_unlock(&p->mutex);
> -
> -		err = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(kfd->kgd,
> -						mem, &kern_addr, &size);
> -		if (err) {
> -			pr_err("Failed to map event page to kernel\n");
> -			return err;
> -		}
> -
> -		err = kfd_event_page_set(p, kern_addr, size);
> -		if (err) {
> -			pr_err("Failed to set event page\n");
> +		if (err)
>  			return err;
> -		}
>  	}
>  
>  	err = kfd_event_create(filp, p, args->event_type,
> @@ -1061,10 +1021,7 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
>  				&args->event_page_offset,
>  				&args->event_slot_index);
>  
> -	return err;
> -
> -out_unlock:
> -	mutex_unlock(&p->mutex);
> +	pr_debug("Created event (id:0x%08x) (%s)\n", args->event_id, __func__);
>  	return err;
>  }
>  
> @@ -2208,6 +2165,41 @@ static int criu_dump_queues(struct kfd_process *p, struct kfd_ioctl_criu_dumper_
>  	return ret;
>  }
>  
> +static int criu_dump_events(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
> +{
> +	struct kfd_criu_event_bucket *ev_buckets;
> +	uint32_t num_events;
> +	int ret =  0;
> +
> +	num_events = kfd_get_num_events(p);
> +	if (args->num_objects != num_events) {
> +		pr_err("Mismatch with number of events (current:%d user:%lld)\n",
> +							num_events, args->num_objects);
> +
> +	}
> +
> +	if (args->objects_size != args->num_objects *
> +				  (sizeof(*ev_buckets) + sizeof(struct kfd_criu_event_priv_data))) {
> +		pr_err("Invalid objects size for events\n");
> +		return -EINVAL;
> +	}
> +
> +	ev_buckets = kvzalloc(args->objects_size, GFP_KERNEL);
> +	if (!ev_buckets)
> +		return -ENOMEM;
> +
> +	ret = kfd_event_dump(p, ev_buckets, args->num_objects);
> +	if (!ret) {
> +		ret = copy_to_user((void __user *)args->objects, ev_buckets, args->objects_size);
> +		if (ret) {
> +			pr_err("Failed to copy events information to user\n");
> +			ret = -EFAULT;
> +		}
> +	}
> +	kvfree(ev_buckets);
> +	return ret;
> +}
> +
>  static int kfd_ioctl_criu_dumper(struct file *filep,
>  				struct kfd_process *p, void *data)
>  {
> @@ -2246,6 +2238,8 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
>  		ret = criu_dump_queues(p, args);
>  		break;
>  	case KFD_CRIU_OBJECT_TYPE_EVENT:
> +		ret = criu_dump_events(p, args);
> +		break;
>  	case KFD_CRIU_OBJECT_TYPE_DEVICE:
>  	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
>  	default:
> @@ -2676,6 +2670,40 @@ static int criu_restore_queues(struct kfd_process *p,
>  	return ret;
>  }
>  
> +static int criu_restore_events(struct file *filp, struct kfd_process *p,
> +			struct kfd_ioctl_criu_restorer_args *args)
> +{
> +	int ret = 0, i;
> +	uint8_t *objects, *private_data;
> +	struct kfd_criu_event_bucket *ev_buckets;
> +
> +	objects = kvzalloc(args->objects_size, GFP_KERNEL);
> +	if (!objects)
> +		return -ENOMEM;
> +
> +	ret = copy_from_user(objects, (void __user *)args->objects, args->objects_size);
> +	if (ret) {
> +		pr_err("Failed to copy event information from user\n");
> +		ret = -EFAULT;
> +		goto exit;
> +	}
> +
> +	ev_buckets = (struct kfd_criu_event_bucket *) objects;
> +	private_data = (void *)(ev_buckets + args->num_objects);
> +
> +	for (i = 0; i < args->num_objects; i++) {
> +		ret = kfd_event_restore(filp, p, &ev_buckets[i], private_data);
> +		if (ret) {
> +			pr_err("Failed to restore event (%d)\n", ret);
> +			goto exit;
> +		}
> +	}
> +
> +exit:
> +	kvfree(ev_buckets);
> +	return ret;
> +}
> +
>  static int kfd_ioctl_criu_restorer(struct file *filep,
>  				struct kfd_process *p, void *data)
>  {
> @@ -2698,6 +2726,8 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
>  		ret = criu_restore_queues(p, args);
>  		break;
>  	case KFD_CRIU_OBJECT_TYPE_EVENT:
> +		ret = criu_restore_events(filep, p, args);
> +		break;
>  	case KFD_CRIU_OBJECT_TYPE_DEVICE:
>  	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
>  	default:
> @@ -2799,9 +2829,13 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
>  	args->queues_priv_data_size = queues_extra_data_size +
>  				(args->total_queues * sizeof(struct kfd_criu_queue_priv_data));
>  
> -	dev_dbg(kfd_device, "Num of bos:%llu queues:%u\n",
> +	args->total_events = kfd_get_num_events(p);
> +	args->events_priv_data_size = args->total_events * sizeof(struct kfd_criu_event_priv_data);
> +
> +	dev_dbg(kfd_device, "Num of bos:%llu queues:%u events:%u\n",
>  				args->total_bos,
> -				args->total_queues);
> +				args->total_queues,
> +				args->total_events);
>  err_unlock:
>  	mutex_unlock(&p->mutex);
>  	return ret;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index ba2c2ce0c55a..18362478e351 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -53,9 +53,9 @@ struct kfd_signal_page {
>  	uint64_t *kernel_address;
>  	uint64_t __user *user_address;
>  	bool need_to_free_pages;
> +	uint64_t user_handle; /* Needed for CRIU dumped and restore */
>  };
>  
> -
>  static uint64_t *page_slots(struct kfd_signal_page *page)
>  {
>  	return page->kernel_address;
> @@ -92,7 +92,8 @@ static struct kfd_signal_page *allocate_signal_page(struct kfd_process *p)
>  }
>  
>  static int allocate_event_notification_slot(struct kfd_process *p,
> -					    struct kfd_event *ev)
> +					    struct kfd_event *ev,
> +					    const int *restore_id)
>  {
>  	int id;
>  
> @@ -104,14 +105,19 @@ static int allocate_event_notification_slot(struct kfd_process *p,
>  		p->signal_mapped_size = 256*8;
>  	}
>  
> -	/*
> -	 * Compatibility with old user mode: Only use signal slots
> -	 * user mode has mapped, may be less than
> -	 * KFD_SIGNAL_EVENT_LIMIT. This also allows future increase
> -	 * of the event limit without breaking user mode.
> -	 */
> -	id = idr_alloc(&p->event_idr, ev, 0, p->signal_mapped_size / 8,
> -		       GFP_KERNEL);
> +	if (restore_id) {
> +		id = idr_alloc(&p->event_idr, ev, *restore_id, *restore_id + 1,
> +				GFP_KERNEL);
> +	} else {
> +		/*
> +		 * Compatibility with old user mode: Only use signal slots
> +		 * user mode has mapped, may be less than
> +		 * KFD_SIGNAL_EVENT_LIMIT. This also allows future increase
> +		 * of the event limit without breaking user mode.
> +		 */
> +		id = idr_alloc(&p->event_idr, ev, 0, p->signal_mapped_size / 8,
> +				GFP_KERNEL);
> +	}
>  	if (id < 0)
>  		return id;
>  
> @@ -178,9 +184,8 @@ static struct kfd_event *lookup_signaled_event_by_partial_id(
>  	return ev;
>  }
>  
> -static int create_signal_event(struct file *devkfd,
> -				struct kfd_process *p,
> -				struct kfd_event *ev)
> +static int create_signal_event(struct file *devkfd, struct kfd_process *p,
> +				struct kfd_event *ev, const int *restore_id)
>  {
>  	int ret;
>  
> @@ -193,7 +198,7 @@ static int create_signal_event(struct file *devkfd,
>  		return -ENOSPC;
>  	}
>  
> -	ret = allocate_event_notification_slot(p, ev);
> +	ret = allocate_event_notification_slot(p, ev, restore_id);
>  	if (ret) {
>  		pr_warn("Signal event wasn't created because out of kernel memory\n");
>  		return ret;
> @@ -209,16 +214,22 @@ static int create_signal_event(struct file *devkfd,
>  	return 0;
>  }
>  
> -static int create_other_event(struct kfd_process *p, struct kfd_event *ev)
> +static int create_other_event(struct kfd_process *p, struct kfd_event *ev, const int *restore_id)
>  {
> -	/* Cast KFD_LAST_NONSIGNAL_EVENT to uint32_t. This allows an
> -	 * intentional integer overflow to -1 without a compiler
> -	 * warning. idr_alloc treats a negative value as "maximum
> -	 * signed integer".
> -	 */
> -	int id = idr_alloc(&p->event_idr, ev, KFD_FIRST_NONSIGNAL_EVENT_ID,
> -			   (uint32_t)KFD_LAST_NONSIGNAL_EVENT_ID + 1,
> -			   GFP_KERNEL);
> +	int id;
> +
> +	if (restore_id)
> +		id = idr_alloc(&p->event_idr, ev, *restore_id, *restore_id + 1,
> +			GFP_KERNEL);
> +	else
> +		/* Cast KFD_LAST_NONSIGNAL_EVENT to uint32_t. This allows an
> +		 * intentional integer overflow to -1 without a compiler
> +		 * warning. idr_alloc treats a negative value as "maximum
> +		 * signed integer".
> +		 */
> +		id = idr_alloc(&p->event_idr, ev, KFD_FIRST_NONSIGNAL_EVENT_ID,
> +				(uint32_t)KFD_LAST_NONSIGNAL_EVENT_ID + 1,
> +				GFP_KERNEL);
>  
>  	if (id < 0)
>  		return id;
> @@ -295,8 +306,8 @@ static bool event_can_be_cpu_signaled(const struct kfd_event *ev)
>  	return ev->type == KFD_EVENT_TYPE_SIGNAL;
>  }
>  
> -int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
> -		       uint64_t size)
> +static int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
> +		       uint64_t size, uint64_t user_handle)
>  {
>  	struct kfd_signal_page *page;
>  
> @@ -315,10 +326,55 @@ int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
>  
>  	p->signal_page = page;
>  	p->signal_mapped_size = size;
> -
> +	p->signal_page->user_handle = user_handle;
>  	return 0;
>  }
>  
> +int kfd_kmap_event_page(struct kfd_process *p, uint64_t event_page_offset)

This function should be static. I also think that this function and
criu_dump/restore_events could be moved into kfd_events.c.

Regards,
  Felix


> +{
> +	struct kfd_dev *kfd;
> +	struct kfd_process_device *pdd;
> +	void *mem, *kern_addr;
> +	uint64_t size;
> +	int err = 0;
> +
> +	if (p->signal_page) {
> +		pr_err("Event page is already set\n");
> +		return -EINVAL;
> +	}
> +
> +	kfd = kfd_device_by_id(GET_GPU_ID(event_page_offset));
> +	if (!kfd) {
> +		pr_err("Getting device by id failed in %s\n", __func__);
> +		return -EINVAL;
> +	}
> +
> +	pdd = kfd_bind_process_to_device(kfd, p);
> +	if (IS_ERR(pdd))
> +		return PTR_ERR(pdd);
> +
> +	mem = kfd_process_device_translate_handle(pdd,
> +			GET_IDR_HANDLE(event_page_offset));
> +	if (!mem) {
> +		pr_err("Can't find BO, offset is 0x%llx\n", event_page_offset);
> +		return -EINVAL;
> +	}
> +
> +	err = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(kfd->kgd,
> +					mem, &kern_addr, &size);
> +	if (err) {
> +		pr_err("Failed to map event page to kernel\n");
> +		return err;
> +	}
> +
> +	err = kfd_event_page_set(p, kern_addr, size, event_page_offset);
> +	if (err) {
> +		pr_err("Failed to set event page\n");
> +		return err;
> +	}
> +	return err;
> +}
> +
>  int kfd_event_create(struct file *devkfd, struct kfd_process *p,
>  		     uint32_t event_type, bool auto_reset, uint32_t node_id,
>  		     uint32_t *event_id, uint32_t *event_trigger_data,
> @@ -343,14 +399,14 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
>  	switch (event_type) {
>  	case KFD_EVENT_TYPE_SIGNAL:
>  	case KFD_EVENT_TYPE_DEBUG:
> -		ret = create_signal_event(devkfd, p, ev);
> +		ret = create_signal_event(devkfd, p, ev, NULL);
>  		if (!ret) {
>  			*event_page_offset = KFD_MMAP_TYPE_EVENTS;
>  			*event_slot_index = ev->event_id;
>  		}
>  		break;
>  	default:
> -		ret = create_other_event(p, ev);
> +		ret = create_other_event(p, ev, NULL);
>  		break;
>  	}
>  
> @@ -366,6 +422,147 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
>  	return ret;
>  }
>  
> +int kfd_event_restore(struct file *devkfd, struct kfd_process *p,
> +		      struct kfd_criu_event_bucket *ev_bucket,
> +		      uint8_t *priv_datas)
> +{
> +	int ret = 0;
> +	struct kfd_criu_event_priv_data *ev_priv;
> +	struct kfd_event *ev;
> +
> +	ev_priv = (struct kfd_criu_event_priv_data *)(priv_datas + ev_bucket->priv_data_offset);
> +
> +	if (ev_priv->user_handle) {
> +		ret = kfd_kmap_event_page(p, ev_priv->user_handle);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ev = kzalloc(sizeof(*ev), GFP_KERNEL);
> +	if (!ev)
> +		return -ENOMEM;
> +
> +	ev->type = ev_priv->type;
> +	ev->auto_reset = ev_priv->auto_reset;
> +	ev->signaled = ev_priv->signaled;
> +
> +	init_waitqueue_head(&ev->wq);
> +
> +	mutex_lock(&p->event_mutex);
> +	switch (ev->type) {
> +	case KFD_EVENT_TYPE_SIGNAL:
> +	case KFD_EVENT_TYPE_DEBUG:
> +		ret = create_signal_event(devkfd, p, ev, &ev_priv->event_id);
> +		break;
> +	case KFD_EVENT_TYPE_MEMORY:
> +		memcpy(&ev->memory_exception_data,
> +			&ev_priv->memory_exception_data,
> +			sizeof(struct kfd_hsa_memory_exception_data));
> +
> +		ev->memory_exception_data.gpu_id = ev_bucket->gpu_id;
> +		ret = create_other_event(p, ev, &ev_priv->event_id);
> +		break;
> +	case KFD_EVENT_TYPE_HW_EXCEPTION:
> +		memcpy(&ev->hw_exception_data,
> +			&ev_priv->hw_exception_data,
> +			sizeof(struct kfd_hsa_hw_exception_data));
> +
> +		ev->hw_exception_data.gpu_id = ev_bucket->gpu_id;
> +		ret = create_other_event(p, ev, &ev_priv->event_id);
> +		break;
> +	}
> +
> +	if (ret)
> +		kfree(ev);
> +
> +	mutex_unlock(&p->event_mutex);
> +
> +	return ret;
> +}
> +
> +int kfd_event_dump(struct kfd_process *p,
> +		   struct kfd_criu_event_bucket *ev_buckets,
> +		   uint32_t num_events)
> +{
> +	struct kfd_event *ev;
> +	struct kfd_criu_event_priv_data *ev_privs;
> +	uint32_t ev_id;
> +	int i = 0;
> +
> +	/* Private data for first event starts after all ev_buckets */
> +	ev_privs = (struct kfd_criu_event_priv_data *)((uint8_t *)ev_buckets +
> +						   (num_events * (sizeof(*ev_buckets))));
> +
> +
> +	idr_for_each_entry(&p->event_idr, ev, ev_id) {
> +		struct kfd_criu_event_bucket *ev_bucket;
> +		struct kfd_criu_event_priv_data *ev_priv;
> +
> +		if (i >= num_events) {
> +			pr_err("Number of events exceeds number allocated\n");
> +			return -ENOMEM;
> +		}
> +
> +		ev_bucket = &ev_buckets[i];
> +
> +		/* Currently, all events have same size of private_data, but the current ioctl's
> +		 * and CRIU plugin supports private_data of variable sizes
> +		 */
> +		ev_priv = &ev_privs[i];
> +
> +		ev_bucket->priv_data_offset = i * sizeof(*ev_priv);
> +		ev_bucket->priv_data_size = sizeof(*ev_priv);
> +
> +		/* We store the user_handle with the first event */
> +		if (i == 0 && p->signal_page)
> +			ev_priv->user_handle = p->signal_page->user_handle;
> +
> +		ev_priv->event_id = ev->event_id;
> +		ev_priv->auto_reset = ev->auto_reset;
> +		ev_priv->type = ev->type;
> +		ev_priv->signaled = ev->signaled;
> +
> +		/* We store the gpu_id in the bucket section so that the userspace CRIU plugin can
> +		 * modify it if needed.
> +		 */
> +		if (ev_priv->type == KFD_EVENT_TYPE_MEMORY) {
> +			memcpy(&ev_priv->memory_exception_data,
> +				&ev->memory_exception_data,
> +				sizeof(struct kfd_hsa_memory_exception_data));
> +
> +			ev_bucket->gpu_id = ev_priv->memory_exception_data.gpu_id;
> +		} else if (ev_priv->type == KFD_EVENT_TYPE_HW_EXCEPTION) {
> +			memcpy(&ev_priv->hw_exception_data,
> +				&ev->hw_exception_data,
> +				sizeof(struct kfd_hsa_hw_exception_data));
> +
> +			ev_bucket->gpu_id = ev_priv->hw_exception_data.gpu_id;
> +		} else
> +			ev_bucket->gpu_id = 0;
> +
> +		pr_debug("Dumped event[%d] id = 0x%08x auto_reset = %x type = %x signaled = %x\n",
> +			  i,
> +			  ev_priv->event_id,
> +			  ev_priv->auto_reset,
> +			  ev_priv->type,
> +			  ev_priv->signaled);
> +		i++;
> +	}
> +	return 0;
> +}
> +
> +int kfd_get_num_events(struct kfd_process *p)
> +{
> +	struct kfd_event *ev;
> +	uint32_t id;
> +	u32 num_events = 0;
> +
> +	idr_for_each_entry(&p->event_idr, ev, id)
> +		num_events++;
> +
> +	return num_events++;
> +}
> +
>  /* Assumes that p is current. */
>  int kfd_event_destroy(struct kfd_process *p, uint32_t event_id)
>  {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 7ed6f831109d..bf10a5305ef7 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1069,9 +1069,26 @@ struct kfd_criu_queue_priv_data {
>  };
>  
>  struct kfd_criu_event_priv_data {
> -	uint64_t reserved;
> +	uint64_t user_handle;
> +	uint32_t event_id;
> +	uint32_t auto_reset;
> +	uint32_t type;
> +	uint32_t signaled;
> +
> +	union {
> +		struct kfd_hsa_memory_exception_data memory_exception_data;
> +		struct kfd_hsa_hw_exception_data hw_exception_data;
> +	};
>  };
>  
> +int kfd_event_restore(struct file *devkfd, struct kfd_process *p,
> +		      struct kfd_criu_event_bucket *ev_bucket,
> +		      uint8_t *priv_datas);
> +
> +int kfd_event_dump(struct kfd_process *p,
> +		   struct kfd_criu_event_bucket *ev_buckets,
> +		   uint32_t num_events);
> +
>  /* CRIU - End */
>  
>  /* Queue Context Management */
> @@ -1238,12 +1255,14 @@ void kfd_signal_iommu_event(struct kfd_dev *dev,
>  void kfd_signal_hw_exception_event(u32 pasid);
>  int kfd_set_event(struct kfd_process *p, uint32_t event_id);
>  int kfd_reset_event(struct kfd_process *p, uint32_t event_id);
> -int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
> -		       uint64_t size);
> +int kfd_kmap_event_page(struct kfd_process *p, uint64_t event_page_offset);
> +
>  int kfd_event_create(struct file *devkfd, struct kfd_process *p,
>  		     uint32_t event_type, bool auto_reset, uint32_t node_id,
>  		     uint32_t *event_id, uint32_t *event_trigger_data,
>  		     uint64_t *event_page_offset, uint32_t *event_slot_index);
> +
> +int kfd_get_num_events(struct kfd_process *p);
>  int kfd_event_destroy(struct kfd_process *p, uint32_t event_id);
>  
>  void kfd_signal_vm_fault_event(struct kfd_dev *dev, u32 pasid,

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 16/18] drm/amdkfd: CRIU implement gpu_id remapping
  2021-08-19 13:37 ` [PATCH 16/18] drm/amdkfd: CRIU implement gpu_id remapping David Yat Sin
@ 2021-08-23 18:48   ` Felix Kuehling
  0 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2021-08-23 18:48 UTC (permalink / raw)
  To: David Yat Sin, amd-gfx; +Cc: rajneesh.bhardwaj


Am 2021-08-19 um 9:37 a.m. schrieb David Yat Sin:
> When doing a restore on a different node, the gpu_id's on the restore
> node may be different. But the user space application will still refer
> use the original gpu_id's in the ioctl calls. Adding code to create a
> gpu id mapping so that kfd can determine actual gpu_id during the user
> ioctl's.
>
> Signed-off-by: David Yat Sin <david.yatsin@amd.com>
> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 400 +++++++++++++++++------
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c  |   5 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  10 +
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c |  18 +
>  4 files changed, 324 insertions(+), 109 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index c8f523d8ab81..90e4d4ce4398 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -294,13 +294,14 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>  		return err;
>  
>  	pr_debug("Looking for gpu id 0x%x\n", args->gpu_id);
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev) {
> +
> +	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
>  		pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);

You need to unlock p->mutex here (i.e. jump to an appropriate error
handling label).

Regards,
  Felix


>  		return -EINVAL;
>  	}
> -
> -	mutex_lock(&p->mutex);
> +	dev = pdd->dev;
>  
>  	pdd = kfd_bind_process_to_device(dev, p);
>  	if (IS_ERR(pdd)) {
> @@ -491,7 +492,6 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
>  					struct kfd_process *p, void *data)
>  {
>  	struct kfd_ioctl_set_memory_policy_args *args = data;
> -	struct kfd_dev *dev;
>  	int err = 0;
>  	struct kfd_process_device *pdd;
>  	enum cache_policy default_policy, alternate_policy;
> @@ -506,13 +506,15 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
>  		return -EINVAL;
>  	}
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> -		return -EINVAL;
> -
>  	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
> +		pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
> +		err = -EINVAL;
> +		goto out;
> +	}
>  
> -	pdd = kfd_bind_process_to_device(dev, p);
> +	pdd = kfd_bind_process_to_device(pdd->dev, p);
>  	if (IS_ERR(pdd)) {
>  		err = -ESRCH;
>  		goto out;
> @@ -525,7 +527,7 @@ static int kfd_ioctl_set_memory_policy(struct file *filep,
>  		(args->alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
>  		   ? cache_policy_coherent : cache_policy_noncoherent;
>  
> -	if (!dev->dqm->ops.set_cache_memory_policy(dev->dqm,
> +	if (!pdd->dev->dqm->ops.set_cache_memory_policy(pdd->dev->dqm,
>  				&pdd->qpd,
>  				default_policy,
>  				alternate_policy,
> @@ -543,17 +545,18 @@ static int kfd_ioctl_set_trap_handler(struct file *filep,
>  					struct kfd_process *p, void *data)
>  {
>  	struct kfd_ioctl_set_trap_handler_args *args = data;
> -	struct kfd_dev *dev;
>  	int err = 0;
>  	struct kfd_process_device *pdd;
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> -		return -EINVAL;
> -
>  	mutex_lock(&p->mutex);
>  
> -	pdd = kfd_bind_process_to_device(dev, p);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
> +		err = -EINVAL;
> +		goto out;
> +	}
> +
> +	pdd = kfd_bind_process_to_device(pdd->dev, p);
>  	if (IS_ERR(pdd)) {
>  		err = -ESRCH;
>  		goto out;
> @@ -577,16 +580,20 @@ static int kfd_ioctl_dbg_register(struct file *filep,
>  	bool create_ok;
>  	long status = 0;
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> -		return -EINVAL;
> +	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
> +		status = -EINVAL;
> +		goto out_unlock_p;
> +	}
> +	dev = pdd->dev;
>  
>  	if (dev->device_info->asic_family == CHIP_CARRIZO) {
>  		pr_debug("kfd_ioctl_dbg_register not supported on CZ\n");
> -		return -EINVAL;
> +		status = -EINVAL;
> +		goto out_unlock_p;
>  	}
>  
> -	mutex_lock(&p->mutex);
>  	mutex_lock(kfd_get_dbgmgr_mutex());
>  
>  	/*
> @@ -596,7 +603,7 @@ static int kfd_ioctl_dbg_register(struct file *filep,
>  	pdd = kfd_bind_process_to_device(dev, p);
>  	if (IS_ERR(pdd)) {
>  		status = PTR_ERR(pdd);
> -		goto out;
> +		goto out_unlock_dbg;
>  	}
>  
>  	if (!dev->dbgmgr) {
> @@ -614,8 +621,9 @@ static int kfd_ioctl_dbg_register(struct file *filep,
>  		status = -EINVAL;
>  	}
>  
> -out:
> +out_unlock_dbg:
>  	mutex_unlock(kfd_get_dbgmgr_mutex());
> +out_unlock_p:
>  	mutex_unlock(&p->mutex);
>  
>  	return status;
> @@ -625,12 +633,18 @@ static int kfd_ioctl_dbg_unregister(struct file *filep,
>  				struct kfd_process *p, void *data)
>  {
>  	struct kfd_ioctl_dbg_unregister_args *args = data;
> +	struct kfd_process_device *pdd;
>  	struct kfd_dev *dev;
>  	long status;
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev || !dev->dbgmgr)
> +	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd || !pdd->dev->dbgmgr) {
> +		mutex_unlock(&p->mutex);
>  		return -EINVAL;
> +	}
> +	dev = pdd->dev;
> +	mutex_unlock(&p->mutex);
>  
>  	if (dev->device_info->asic_family == CHIP_CARRIZO) {
>  		pr_debug("kfd_ioctl_dbg_unregister not supported on CZ\n");
> @@ -664,6 +678,7 @@ static int kfd_ioctl_dbg_address_watch(struct file *filep,
>  {
>  	struct kfd_ioctl_dbg_address_watch_args *args = data;
>  	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
>  	struct dbg_address_watch_info aw_info;
>  	unsigned char *args_buff;
>  	long status;
> @@ -673,9 +688,15 @@ static int kfd_ioctl_dbg_address_watch(struct file *filep,
>  
>  	memset((void *) &aw_info, 0, sizeof(struct dbg_address_watch_info));
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> +	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
> +		mutex_unlock(&p->mutex);
> +		pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
>  		return -EINVAL;
> +	}
> +	dev = pdd->dev;
> +	mutex_unlock(&p->mutex);
>  
>  	if (dev->device_info->asic_family == CHIP_CARRIZO) {
>  		pr_debug("kfd_ioctl_dbg_wave_control not supported on CZ\n");
> @@ -764,6 +785,7 @@ static int kfd_ioctl_dbg_wave_control(struct file *filep,
>  {
>  	struct kfd_ioctl_dbg_wave_control_args *args = data;
>  	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
>  	struct dbg_wave_control_info wac_info;
>  	unsigned char *args_buff;
>  	uint32_t computed_buff_size;
> @@ -781,9 +803,15 @@ static int kfd_ioctl_dbg_wave_control(struct file *filep,
>  				sizeof(wac_info.dbgWave_msg.MemoryVA) +
>  				sizeof(wac_info.trapId);
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> +	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
> +		mutex_unlock(&p->mutex);
> +		pr_debug("Could not find gpu id 0x%x\n", args->gpu_id);
>  		return -EINVAL;
> +	}
> +	dev = pdd->dev;
> +	mutex_unlock(&p->mutex);
>  
>  	if (dev->device_info->asic_family == CHIP_CARRIZO) {
>  		pr_debug("kfd_ioctl_dbg_wave_control not supported on CZ\n");
> @@ -847,16 +875,19 @@ static int kfd_ioctl_get_clock_counters(struct file *filep,
>  				struct kfd_process *p, void *data)
>  {
>  	struct kfd_ioctl_get_clock_counters_args *args = data;
> -	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (dev)
> +	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (pdd)
>  		/* Reading GPU clock counter from KGD */
> -		args->gpu_clock_counter = amdgpu_amdkfd_get_gpu_clock_counter(dev->kgd);
> +		args->gpu_clock_counter = amdgpu_amdkfd_get_gpu_clock_counter(pdd->dev->kgd);
>  	else
>  		/* Node without GPU resource */
>  		args->gpu_clock_counter = 0;
>  
> +	mutex_unlock(&p->mutex);
> +
>  	/* No access to rdtsc. Using raw monotonic time */
>  	args->cpu_clock_counter = ktime_get_raw_ns();
>  	args->system_clock_counter = ktime_get_boottime_ns();
> @@ -1070,11 +1101,13 @@ static int kfd_ioctl_set_scratch_backing_va(struct file *filep,
>  	struct kfd_dev *dev;
>  	long err;
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> -		return -EINVAL;
> -
>  	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
> +		err = -EINVAL;
> +		goto bind_process_to_device_fail;
> +	}
> +	dev = pdd->dev;
>  
>  	pdd = kfd_bind_process_to_device(dev, p);
>  	if (IS_ERR(pdd)) {
> @@ -1102,15 +1135,20 @@ static int kfd_ioctl_get_tile_config(struct file *filep,
>  		struct kfd_process *p, void *data)
>  {
>  	struct kfd_ioctl_get_tile_config_args *args = data;
> -	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
>  	struct tile_config config;
>  	int err = 0;
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> +	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
> +		mutex_unlock(&p->mutex);
>  		return -EINVAL;
> +	}
>  
> -	amdgpu_amdkfd_get_tile_config(dev->kgd, &config);
> +	amdgpu_amdkfd_get_tile_config(pdd->dev->kgd, &config);
> +
> +	mutex_unlock(&p->mutex);
>  
>  	args->gb_addr_config = config.gb_addr_config;
>  	args->num_banks = config.num_banks;
> @@ -1145,21 +1183,15 @@ static int kfd_ioctl_acquire_vm(struct file *filep, struct kfd_process *p,
>  {
>  	struct kfd_ioctl_acquire_vm_args *args = data;
>  	struct kfd_process_device *pdd;
> -	struct kfd_dev *dev;
>  	struct file *drm_file;
>  	int ret;
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> -		return -EINVAL;
> -
>  	drm_file = fget(args->drm_fd);
>  	if (!drm_file)
>  		return -EINVAL;
>  
>  	mutex_lock(&p->mutex);
> -
> -	pdd = kfd_get_process_device_data(dev, p);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
>  	if (!pdd) { 
>  		ret = -EINVAL;
>  		goto err_unlock;
> @@ -1218,19 +1250,23 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
>  	if (args->size == 0)
>  		return -EINVAL;
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> -		return -EINVAL;
> +	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
> +		err = -EINVAL;
> +		goto err_unlock;
> +	}
> +
> +	dev = pdd->dev;
>  
>  	if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) &&
>  		(flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) &&
>  		!kfd_dev_is_large_bar(dev)) {
>  		pr_err("Alloc host visible vram on small bar is not allowed\n");
> -		return -EINVAL;
> +		err = -EINVAL;
> +		goto err_unlock;
>  	}
>  
> -	mutex_lock(&p->mutex);
> -
>  	pdd = kfd_bind_process_to_device(dev, p);
>  	if (IS_ERR(pdd)) {
>  		err = PTR_ERR(pdd);
> @@ -1301,17 +1337,12 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
>  	struct kfd_ioctl_free_memory_of_gpu_args *args = data;
>  	struct kfd_process_device *pdd;
>  	void *mem;
> -	struct kfd_dev *dev;
>  	int ret;
>  	uint64_t size = 0;
>  
> -	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
> -	if (!dev)
> -		return -EINVAL;
> -
>  	mutex_lock(&p->mutex);
>  
> -	pdd = kfd_get_process_device_data(dev, p);
> +	pdd = kfd_process_device_data_by_id(p, GET_GPU_ID(args->handle));
>  	if (!pdd) {
>  		pr_err("Process device data doesn't exist\n");
>  		ret = -EINVAL;
> @@ -1325,7 +1356,7 @@ static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
>  		goto err_unlock;
>  	}
>  
> -	ret = amdgpu_amdkfd_gpuvm_free_memory_of_gpu(dev->kgd,
> +	ret = amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd,
>  				(struct kgd_mem *)mem, pdd->drm_priv, &size);
>  
>  	/* If freeing the buffer failed, leave the handle in place for
> @@ -1348,15 +1379,11 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>  	struct kfd_ioctl_map_memory_to_gpu_args *args = data;
>  	struct kfd_process_device *pdd, *peer_pdd;
>  	void *mem;
> -	struct kfd_dev *dev, *peer;
> +	struct kfd_dev *dev;
>  	long err = 0;
>  	int i;
>  	uint32_t *devices_arr = NULL;
>  
> -	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
> -	if (!dev)
> -		return -EINVAL;
> -
>  	if (!args->n_devices) {
>  		pr_debug("Device IDs array empty\n");
>  		return -EINVAL;
> @@ -1380,6 +1407,12 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>  	}
>  
>  	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, GET_GPU_ID(args->handle));
> +	if (!pdd) {
> +		err = -EINVAL;
> +		goto get_process_device_data_failed;
> +	}
> +	dev = pdd->dev;
>  
>  	pdd = kfd_bind_process_to_device(dev, p);
>  	if (IS_ERR(pdd)) {
> @@ -1395,21 +1428,21 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>  	}
>  
>  	for (i = args->n_success; i < args->n_devices; i++) {
> -		peer = kfd_device_by_id(devices_arr[i]);
> -		if (!peer) {
> +		peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]);
> +		if (!peer_pdd) {
>  			pr_debug("Getting device by id failed for 0x%x\n",
>  				 devices_arr[i]);
>  			err = -EINVAL;
>  			goto get_mem_obj_from_handle_failed;
>  		}
>  
> -		peer_pdd = kfd_bind_process_to_device(peer, p);
> +		peer_pdd = kfd_bind_process_to_device(peer_pdd->dev, p);
>  		if (IS_ERR(peer_pdd)) {
>  			err = PTR_ERR(peer_pdd);
>  			goto get_mem_obj_from_handle_failed;
>  		}
>  		err = amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> -			peer->kgd, (struct kgd_mem *)mem, peer_pdd->drm_priv);
> +			peer_pdd->dev->kgd, (struct kgd_mem *)mem, peer_pdd->drm_priv);
>  		if (err) {
>  			pr_err("Failed to map to gpu %d/%d\n",
>  			       i, args->n_devices);
> @@ -1428,12 +1461,10 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>  
>  	/* Flush TLBs after waiting for the page table updates to complete */
>  	for (i = 0; i < args->n_devices; i++) {
> -		peer = kfd_device_by_id(devices_arr[i]);
> -		if (WARN_ON_ONCE(!peer))
> -			continue;
> -		peer_pdd = kfd_get_process_device_data(peer, p);
> +		peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]);
>  		if (WARN_ON_ONCE(!peer_pdd))
>  			continue;
> +
>  		kfd_flush_tlb(peer_pdd);
>  	}
>  
> @@ -1441,6 +1472,7 @@ static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
>  
>  	return err;
>  
> +get_process_device_data_failed:
>  bind_process_to_device_failed:
>  get_mem_obj_from_handle_failed:
>  map_memory_to_gpu_failed:
> @@ -1458,14 +1490,9 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>  	struct kfd_ioctl_unmap_memory_from_gpu_args *args = data;
>  	struct kfd_process_device *pdd, *peer_pdd;
>  	void *mem;
> -	struct kfd_dev *dev, *peer;
>  	long err = 0;
>  	uint32_t *devices_arr = NULL, i;
>  
> -	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
> -	if (!dev)
> -		return -EINVAL;
> -
>  	if (!args->n_devices) {
>  		pr_debug("Device IDs array empty\n");
>  		return -EINVAL;
> @@ -1489,8 +1516,7 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>  	}
>  
>  	mutex_lock(&p->mutex);
> -
> -	pdd = kfd_get_process_device_data(dev, p);
> +	pdd = kfd_process_device_data_by_id(p, GET_GPU_ID(args->handle));
>  	if (!pdd) {
>  		err = -EINVAL;
>  		goto bind_process_to_device_failed;
> @@ -1504,19 +1530,13 @@ static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
>  	}
>  
>  	for (i = args->n_success; i < args->n_devices; i++) {
> -		peer = kfd_device_by_id(devices_arr[i]);
> -		if (!peer) {
> -			err = -EINVAL;
> -			goto get_mem_obj_from_handle_failed;
> -		}
> -
> -		peer_pdd = kfd_get_process_device_data(peer, p);
> +		peer_pdd = kfd_process_device_data_by_id(p, devices_arr[i]);
>  		if (!peer_pdd) {
> -			err = -ENODEV;
> +			err = -EINVAL;
>  			goto get_mem_obj_from_handle_failed;
>  		}
>  		err = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
> -			peer->kgd, (struct kgd_mem *)mem, peer_pdd->drm_priv);
> +			peer_pdd->dev->kgd, (struct kgd_mem *)mem, peer_pdd->drm_priv);
>  		if (err) {
>  			pr_err("Failed to unmap from gpu %d/%d\n",
>  			       i, args->n_devices);
> @@ -1645,23 +1665,26 @@ static int kfd_ioctl_import_dmabuf(struct file *filep,
>  	void *mem;
>  	int r;
>  
> -	dev = kfd_device_by_id(args->gpu_id);
> -	if (!dev)
> -		return -EINVAL;
> +	mutex_lock(&p->mutex);
> +	pdd = kfd_process_device_data_by_id(p, args->gpu_id);
> +	if (!pdd) {
> +		r = -EINVAL;
> +		goto err_unlock;
> +	}
>  
>  	dmabuf = dma_buf_get(args->dmabuf_fd);
> -	if (IS_ERR(dmabuf))
> -		return PTR_ERR(dmabuf);
> -
> -	mutex_lock(&p->mutex);
> +	if (IS_ERR(dmabuf)) {
> +		r = PTR_ERR(dmabuf);
> +		goto err_unlock;
> +	}
>  
> -	pdd = kfd_bind_process_to_device(dev, p);
> +	pdd = kfd_bind_process_to_device(pdd->dev, p);
>  	if (IS_ERR(pdd)) {
>  		r = PTR_ERR(pdd);
>  		goto err_unlock;
>  	}
>  
> -	r = amdgpu_amdkfd_gpuvm_import_dmabuf(dev->kgd, dmabuf,
> +	r = amdgpu_amdkfd_gpuvm_import_dmabuf(pdd->dev->kgd, dmabuf,
>  					      args->va_addr, pdd->drm_priv,
>  					      (struct kgd_mem **)&mem, &size,
>  					      NULL);
> @@ -1695,13 +1718,19 @@ static int kfd_ioctl_smi_events(struct file *filep,
>  				struct kfd_process *p, void *data)
>  {
>  	struct kfd_ioctl_smi_events_args *args = data;
> -	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
>  
> -	dev = kfd_device_by_id(args->gpuid);
> -	if (!dev)
> +	mutex_lock(&p->mutex);
> +
> +	pdd = kfd_process_device_data_by_id(p, args->gpuid);
> +	if (!pdd) {
> +		mutex_unlock(&p->mutex);
>  		return -EINVAL;
> +	}
>  
> -	return kfd_smi_event_open(dev, &args->anon_fd);
> +	mutex_unlock(&p->mutex);
> +
> +	return kfd_smi_event_open(pdd->dev, &args->anon_fd);
>  }
>  
>  static int kfd_ioctl_set_xnack_mode(struct file *filep,
> @@ -1800,6 +1829,57 @@ static int criu_dump_process(struct kfd_process *p, struct kfd_ioctl_criu_dumper
>  	return ret;
>  }
>  
> +static int criu_dump_devices(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
> +{
> +	struct kfd_criu_device_bucket *device_buckets;
> +	int ret = 0, i;
> +
> +	if (args->num_objects != p->n_pdds) {
> +		pr_err("Mismatch with number of devices (current:%d user:%lld)\n",
> +							p->n_pdds, args->num_objects);
> +		return -EINVAL;
> +	}
> +
> +	if (args->objects_size != args->num_objects *
> +		(sizeof(*device_buckets) + sizeof(struct kfd_criu_device_priv_data))) {
> +		pr_err("Invalid objects size for devices\n");
> +		return -EINVAL;
> +	}
> +
> +	device_buckets = kvzalloc(args->objects_size, GFP_KERNEL);
> +	if (!device_buckets)
> +		return -ENOMEM;
> +
> +	/* Private data for devices it not currently used. To set private data
> +	 * struct kfd_criu_device_priv_data * device_privs = (struct kfd_criu_device_priv_data*)
> +	 *				((uint8_t*)device_buckets +
> +	 *				 (args->num_objects * (sizeof(*device_buckets))));
> +	 */
> +
> +	for (i = 0; i < args->num_objects; i++) {
> +		struct kfd_process_device *pdd = p->pdds[i];
> +
> +		device_buckets[i].user_gpu_id = pdd->user_gpu_id;
> +		device_buckets[i].actual_gpu_id = pdd->dev->id;
> +
> +		/* priv_data does not contain useful information for now and is reserved for
> +		 * future use, so we do not set its contents
> +		 */
> +		device_buckets[i].priv_data_offset = i * sizeof(struct kfd_criu_device_priv_data);
> +		device_buckets[i].priv_data_size = sizeof(struct kfd_criu_device_priv_data);
> +	}
> +
> +	ret = copy_to_user((void __user *)args->objects, device_buckets, args->objects_size);
> +
> +	if (ret) {
> +		pr_err("Failed to copy device information to user\n");
> +		ret = -EFAULT;
> +	}
> +
> +	kvfree(device_buckets);
> +	return ret;
> +}
> +
>  uint64_t get_process_num_bos(struct kfd_process *p)
>  {
>  	uint64_t num_of_bos = 0, i;
> @@ -2231,6 +2311,9 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
>  	case KFD_CRIU_OBJECT_TYPE_PROCESS:
>  		ret = criu_dump_process(p, args);
>  		break;
> +	case KFD_CRIU_OBJECT_TYPE_DEVICE:
> +		ret = criu_dump_devices(p, args);
> +		break;
>  	case KFD_CRIU_OBJECT_TYPE_BO:
>  		ret = criu_dump_bos(p, args);
>  		break;
> @@ -2240,7 +2323,6 @@ static int kfd_ioctl_criu_dumper(struct file *filep,
>  	case KFD_CRIU_OBJECT_TYPE_EVENT:
>  		ret = criu_dump_events(p, args);
>  		break;
> -	case KFD_CRIU_OBJECT_TYPE_DEVICE:
>  	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
>  	default:
>  		pr_err("Unsupported object type:%d\n", args->type);
> @@ -2301,6 +2383,102 @@ static int criu_restore_process(struct kfd_process *p, struct kfd_ioctl_criu_res
>  	return ret;
>  }
>  
> +static int criu_restore_devices(struct kfd_process *p, struct kfd_ioctl_criu_restorer_args *args)
> +{
> +	int ret = 0, i;
> +	uint8_t *objects;
> +	struct kfd_criu_device_bucket *device_buckets;
> +
> +	if (args->num_objects != p->n_pdds)
> +		return -EINVAL;
> +
> +	if (args->objects_size != args->num_objects *
> +		(sizeof(*device_buckets) + sizeof(struct kfd_criu_device_priv_data))) {
> +		pr_err("Invalid objects size for devices\n");
> +		return -EINVAL;
> +	}
> +
> +	objects = kmalloc(args->objects_size, GFP_KERNEL);
> +	if (!objects)
> +		return -ENOMEM;
> +
> +	ret = copy_from_user(objects, (void __user *)args->objects, args->objects_size);
> +	if (ret) {
> +		pr_err("Failed to copy devices information from user\n");
> +		ret = -EFAULT;
> +		goto exit;
> +	}
> +
> +	device_buckets = (struct kfd_criu_device_bucket *) objects;
> +
> +	for (i = 0; i < args->num_objects; i++) {
> +		struct kfd_dev *dev;
> +		struct kfd_process_device *pdd;
> +		struct file *drm_file;
> +
> +		/* device private data is not currently used. To access device private data:
> +		 * uint8_t *private_datas = objects +
> +		 *				(args->num_objects * sizeof(*device_buckets));
> +		 *
> +		 * struct kfd_criu_device_priv_data *device_priv =
> +		 *			(struct kfd_criu_device_priv_data*)
> +		 *			(private_datas + device_buckets[i].priv_data_offset);
> +		 */
> +
> +		dev = kfd_device_by_id(device_buckets[i].actual_gpu_id);
> +		if (!dev) {
> +			pr_err("Failed to find device with gpu_id = %x\n",
> +				device_buckets[i].actual_gpu_id);
> +			ret = -EINVAL;
> +			goto exit;
> +		}
> +
> +		pdd = kfd_get_process_device_data(dev, p);
> +		if (!pdd) {
> +			pr_err("Failed to get pdd for gpu_id = %x\n",
> +					device_buckets[i].actual_gpu_id);
> +			ret = -EINVAL;
> +			goto exit;
> +		}
> +		pdd->user_gpu_id = device_buckets[i].user_gpu_id;
> +
> +		drm_file = fget(device_buckets[i].drm_fd);
> +		if (!drm_file) {
> +			pr_err("Invalid render node file descriptor sent from plugin (%d)\n",
> +				device_buckets[i].drm_fd);
> +			ret = -EINVAL;
> +			goto exit;
> +		}
> +
> +		if (pdd->drm_file) {
> +			ret = -EINVAL;
> +			goto exit;
> +		}
> +
> +		/* create the vm using render nodes for kfd pdd */
> +		if (kfd_process_device_init_vm(pdd, drm_file)) {
> +			pr_err("could not init vm for given pdd\n");
> +			/* On success, the PDD keeps the drm_file reference */
> +			fput(drm_file);
> +			ret = -EINVAL;
> +			goto exit;
> +		}
> +		/*
> +		 * pdd now already has the vm bound to render node so below api won't create a new
> +		 * exclusive kfd mapping but use existing one with renderDXXX but is still needed
> +		 * for iommu v2 binding  and runtime pm.
> +		 */
> +		pdd = kfd_bind_process_to_device(dev, p);
> +		if (IS_ERR(pdd)) {
> +			ret = PTR_ERR(pdd);
> +			goto exit;
> +		}
> +	}
> +exit:
> +	kvfree(objects);
> +	return ret;
> +}
> +
>  static int criu_restore_bos(struct kfd_process *p, struct kfd_ioctl_criu_restorer_args *args)
>  {
>  	uint8_t *objects, *private_data;
> @@ -2719,6 +2897,9 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
>  	case KFD_CRIU_OBJECT_TYPE_PROCESS:
>  		ret = criu_restore_process(p, args);
>  		break;
> +	case KFD_CRIU_OBJECT_TYPE_DEVICE:
> +		ret = criu_restore_devices(p, args);
> +		break;
>  	case KFD_CRIU_OBJECT_TYPE_BO:
>  		ret = criu_restore_bos(p, args);
>  		break;
> @@ -2728,7 +2909,6 @@ static int kfd_ioctl_criu_restorer(struct file *filep,
>  	case KFD_CRIU_OBJECT_TYPE_EVENT:
>  		ret = criu_restore_events(filep, p, args);
>  		break;
> -	case KFD_CRIU_OBJECT_TYPE_DEVICE:
>  	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
>  	default:
>  		pr_err("Unsupported object type:%d\n", args->type);
> @@ -2819,6 +2999,11 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
>  
>  	args->process_priv_data_size = sizeof(struct kfd_criu_process_priv_data);
>  
> +	args->total_devices = p->n_pdds;
> +	/* devices_priv_data_size does not contain any useful information for now */
> +	args->devices_priv_data_size = args->total_devices *
> +					sizeof(struct kfd_criu_device_priv_data);
> +
>  	args->total_bos = get_process_num_bos(p);
>  	args->bos_priv_data_size = args->total_bos * sizeof(struct kfd_criu_bo_priv_data);
>  
> @@ -2832,7 +3017,8 @@ static int kfd_ioctl_criu_process_info(struct file *filep,
>  	args->total_events = kfd_get_num_events(p);
>  	args->events_priv_data_size = args->total_events * sizeof(struct kfd_criu_event_priv_data);
>  
> -	dev_dbg(kfd_device, "Num of bos:%llu queues:%u events:%u\n",
> +	dev_dbg(kfd_device, "Num of devices:%u bos:%llu queues:%u events:%u\n",
> +				args->total_devices,
>  				args->total_bos,
>  				args->total_queues,
>  				args->total_events);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 18362478e351..5e9067b70908 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -343,11 +343,12 @@ int kfd_kmap_event_page(struct kfd_process *p, uint64_t event_page_offset)
>  		return -EINVAL;
>  	}
>  
> -	kfd = kfd_device_by_id(GET_GPU_ID(event_page_offset));
> -	if (!kfd) {
> +	pdd = kfd_process_device_data_by_id(p, GET_GPU_ID(event_page_offset));
> +	if (!pdd) {
>  		pr_err("Getting device by id failed in %s\n", __func__);
>  		return -EINVAL;
>  	}
> +	kfd = pdd->dev;
>  
>  	pdd = kfd_bind_process_to_device(kfd, p);
>  	if (IS_ERR(pdd))
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index bf10a5305ef7..1912df8d9101 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -759,6 +759,13 @@ struct kfd_process_device {
>  	 *  number of CU's a device has along with number of other competing processes
>  	 */
>  	struct attribute attr_cu_occupancy;
> +
> +	/*
> +	 * If this process has been checkpointed before, then the user
> +	 * application will use the original gpu_id on the
> +	 * checkpointed node to refer to this device.
> +	 */
> +	uint32_t user_gpu_id;
>  };
>  
>  #define qpd_to_pdd(x) container_of(x, struct kfd_process_device, qpd)
> @@ -914,6 +921,9 @@ int kfd_process_restore_queues(struct kfd_process *p);
>  void kfd_suspend_all_processes(void);
>  int kfd_resume_all_processes(void);
>  
> +struct kfd_process_device *kfd_process_device_data_by_id(struct kfd_process *process,
> +				uint32_t gpu_id);
> +
>  int kfd_process_device_init_vm(struct kfd_process_device *pdd,
>  			       struct file *drm_file);
>  struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index e4cb2f778590..a23f2162eb8b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -1425,6 +1425,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>  	pdd->runtime_inuse = false;
>  	pdd->vram_usage = 0;
>  	pdd->sdma_past_activity_counter = 0;
> +	pdd->user_gpu_id = dev->id;
>  	atomic64_set(&pdd->evict_duration_counter, 0);
>  	p->pdds[p->n_pdds++] = pdd;
>  
> @@ -1898,6 +1899,23 @@ void kfd_flush_tlb(struct kfd_process_device *pdd)
>  	}
>  }
>  
> +struct kfd_process_device *kfd_process_device_data_by_id(struct kfd_process *p, uint32_t gpu_id)
> +{
> +	int i;
> +
> +	if (gpu_id) {
> +		for (i = 0; i < p->n_pdds; i++) {
> +			struct kfd_process_device *pdd = p->pdds[i];
> +
> +			if (pdd->user_gpu_id == gpu_id)
> +				return pdd;
> +		}
> +
> +		WARN_ONCE(1, "Failed to find mapping for gpu = 0x%x\n",  gpu_id);
> +	}
> +	return NULL;
> +}
> +
>  #if defined(CONFIG_DEBUG_FS)
>  
>  int kfd_debugfs_mqds_by_process(struct seq_file *m, void *data)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl
  2021-08-19 13:37 ` [PATCH 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl David Yat Sin
@ 2021-08-23 18:53   ` Felix Kuehling
  0 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2021-08-23 18:53 UTC (permalink / raw)
  To: David Yat Sin, amd-gfx; +Cc: rajneesh.bhardwaj

You haven't implemented objects_index_start yet. I think this is only
important later on for dumping BOs with dmabuf handles to avoid
exhausting the file-descriptor limit. For now, there should at least be
a check for objects_index_start == 0. We can fail if it's not 0 and
implement that support later. But allowing non-0 values now without
implementing them could lead to ABI breakages later on.

Regards,
  Felix


Am 2021-08-19 um 9:37 a.m. schrieb David Yat Sin:
> From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
>
> This adds support to discover the  buffer objects that belong to a
> process being checkpointed. The data corresponding to these buffer
> objects is returned to user space plugin running under criu master
> context which then stores this info to recreate these buffer objects
> during a restore operation.
>
> Signed-off-by: David Yat Sin <david.yatsin@amd.com>
> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
> (cherry picked from commit 1f114a541bd21873de905db64bb9efa673274d4b)
> (cherry picked from commit 20c435fad57d3201e5402e38ae778f1f0f84a09d)
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c  |  20 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h  |   2 +
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 182 ++++++++++++++++++++++-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |   3 +-
>  4 files changed, 204 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 7e7d8330d64b..99ea29fd12bd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1181,6 +1181,26 @@ static void amdgpu_ttm_tt_unpopulate(struct ttm_device *bdev,
>  	return ttm_pool_free(&adev->mman.bdev.pool, ttm);
>  }
>  
> +/**
> + * amdgpu_ttm_tt_get_userptr - Return the userptr GTT ttm_tt for the current
> + * task
> + *
> + * @tbo: The ttm_buffer_object that contains the userptr
> + * @user_addr:  The returned value
> + */
> +int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
> +			      uint64_t *user_addr)
> +{
> +	struct amdgpu_ttm_tt *gtt;
> +
> +	if (!tbo->ttm)
> +		return -EINVAL;
> +
> +	gtt = (void *)tbo->ttm;
> +	*user_addr = gtt->userptr;
> +	return 0;
> +}
> +
>  /**
>   * amdgpu_ttm_tt_set_userptr - Initialize userptr GTT ttm_tt for the current
>   * task
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> index 9e38475e0f8d..dddd76f7a92e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> @@ -168,6 +168,8 @@ static inline bool amdgpu_ttm_tt_get_user_pages_done(struct ttm_tt *ttm)
>  #endif
>  
>  void amdgpu_ttm_tt_set_user_pages(struct ttm_tt *ttm, struct page **pages);
> +int amdgpu_ttm_tt_get_userptr(const struct ttm_buffer_object *tbo,
> +			      uint64_t *user_addr);
>  int amdgpu_ttm_tt_set_userptr(struct ttm_buffer_object *bo,
>  			      uint64_t addr, uint32_t flags);
>  bool amdgpu_ttm_tt_has_userptr(struct ttm_tt *ttm);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 09e2d30515e2..d548e6691d69 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -42,6 +42,7 @@
>  #include "kfd_svm.h"
>  #include "amdgpu_amdkfd.h"
>  #include "kfd_smi_events.h"
> +#include "amdgpu_object.h"
>  
>  static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>  static int kfd_open(struct inode *, struct file *);
> @@ -1804,6 +1805,44 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
>  }
>  #endif
>  
> +static int criu_dump_process(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
> +{
> +	int ret;
> +	struct kfd_criu_process_bucket *process_bucket;
> +	struct kfd_criu_process_priv_data *process_priv;
> +
> +	if (args->num_objects != 1) {
> +		pr_err("Only 1 process supported\n");
> +		return -EINVAL;
> +	}
> +
> +	if (args->objects_size != sizeof(*process_bucket) + sizeof(*process_priv)) {
> +		pr_err("Invalid objects size for process\n");
> +		return -EINVAL;
> +	}
> +
> +	process_bucket = kzalloc(args->objects_size, GFP_KERNEL);
> +	if (!process_bucket)
> +		return -ENOMEM;
> +
> +	/* Private data starts after process bucket */
> +	process_priv = (void *)(process_bucket + 1);
> +
> +	process_priv->version = KFD_CRIU_PRIV_VERSION;
> +
> +	process_bucket->priv_data_offset = 0;
> +	process_bucket->priv_data_size = sizeof(*process_priv);
> +
> +	ret = copy_to_user((void __user *)args->objects, process_bucket, args->objects_size);
> +	if (ret) {
> +		pr_err("Failed to copy process information to user\n");
> +		ret = -EFAULT;
> +	}
> +
> +	kfree(process_bucket);
> +	return ret;
> +}
> +
>  uint64_t get_process_num_bos(struct kfd_process *p)
>  {
>  	uint64_t num_of_bos = 0, i;
> @@ -1824,12 +1863,151 @@ uint64_t get_process_num_bos(struct kfd_process *p)
>  	return num_of_bos;
>  }
>  
> +static int criu_dump_bos(struct kfd_process *p, struct kfd_ioctl_criu_dumper_args *args)
> +{
> +	struct kfd_criu_bo_bucket *bo_buckets;
> +	struct kfd_criu_bo_priv_data *bo_privs;
> +	uint64_t num_bos;
> +
> +	int ret = 0, pdd_index, bo_index = 0, id;
> +	void *mem;
> +
> +	num_bos = get_process_num_bos(p);
> +
> +	if (args->num_objects != num_bos) {
> +		pr_err("Mismatch with number of BOs (current:%lld user:%lld)\n",
> +				num_bos, args->num_objects);
> +		return -EINVAL;
> +	}
> +
> +	if (args->objects_size != args->num_objects * (sizeof(*bo_buckets) + sizeof(*bo_privs))) {
> +		pr_err("Invalid objects size for BOs\n");
> +		return -EINVAL;
> +	}
> +
> +	bo_buckets = kvzalloc(args->objects_size, GFP_KERNEL);
> +	if (!bo_buckets)
> +		return -ENOMEM;
> +
> +	/* Private data for first BO starts after all bo_buckets */
> +	bo_privs = (void *)(bo_buckets + args->num_objects);
> +
> +	for (pdd_index = 0; pdd_index < p->n_pdds; pdd_index++) {
> +		struct kfd_process_device *pdd = p->pdds[pdd_index];
> +		struct amdgpu_bo *dumper_bo;
> +		struct kgd_mem *kgd_mem;
> +
> +		idr_for_each_entry(&pdd->alloc_idr, mem, id) {
> +			struct kfd_criu_bo_bucket *bo_bucket;
> +			struct kfd_criu_bo_priv_data *bo_priv;
> +
> +			if (!mem) {
> +				ret = -ENOMEM;
> +				goto exit;
> +			}
> +
> +			kgd_mem = (struct kgd_mem *)mem;
> +			dumper_bo = kgd_mem->bo;
> +
> +			if ((uint64_t)kgd_mem->va <= pdd->gpuvm_base)
> +				continue;
> +
> +			bo_bucket = &bo_buckets[bo_index];
> +			bo_priv = &bo_privs[bo_index];
> +
> +			bo_bucket->addr = (uint64_t)kgd_mem->va;
> +			bo_bucket->size = amdgpu_bo_size(dumper_bo);
> +			bo_bucket->gpu_id = pdd->dev->id;
> +			bo_bucket->alloc_flags = (uint32_t)kgd_mem->alloc_flags;
> +
> +			bo_bucket->priv_data_offset = bo_index * sizeof(*bo_priv);
> +			bo_bucket->priv_data_size = sizeof(*bo_priv);
> +
> +			bo_priv->idr_handle = id;
> +			if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_USERPTR) {
> +				ret = amdgpu_ttm_tt_get_userptr(&dumper_bo->tbo,
> +								&bo_priv->user_addr);
> +				if (ret) {
> +					pr_err("Failed to obtain user address for user-pointer bo\n");
> +					goto exit;
> +				}
> +			}
> +			if (bo_bucket->alloc_flags & KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL)
> +				bo_bucket->offset = KFD_MMAP_TYPE_DOORBELL |
> +					KFD_MMAP_GPU_ID(pdd->dev->id);
> +			else if (bo_bucket->alloc_flags &
> +				KFD_IOC_ALLOC_MEM_FLAGS_MMIO_REMAP)
> +				bo_bucket->offset = KFD_MMAP_TYPE_MMIO |
> +					KFD_MMAP_GPU_ID(pdd->dev->id);
> +			else
> +				bo_bucket->offset = amdgpu_bo_mmap_offset(dumper_bo);
> +
> +			pr_debug("bo_size = 0x%llx, bo_addr = 0x%llx bo_offset = 0x%llx\n"
> +					"gpu_id = 0x%x alloc_flags = 0x%x idr_handle = 0x%x",
> +					bo_bucket->size,
> +					bo_bucket->addr,
> +					bo_bucket->offset,
> +					bo_bucket->gpu_id,
> +					bo_bucket->alloc_flags,
> +					bo_priv->idr_handle);
> +			bo_index++;
> +		}
> +	}
> +
> +	ret = copy_to_user((void __user *)args->objects, bo_buckets, args->objects_size);
> +	if (ret) {
> +		pr_err("Failed to copy bo information to user\n");
> +		ret = -EFAULT;
> +	}
> +
> +exit:
> +	kvfree(bo_buckets);
> +	return ret;
> +}
> +
>  static int kfd_ioctl_criu_dumper(struct file *filep,
>  				struct kfd_process *p, void *data)
>  {
> -	pr_debug("Inside %s\n", __func__);
> +	struct kfd_ioctl_criu_dumper_args *args = data;
> +	int ret;
>  
> -	return 0;
> +	pr_debug("CRIU dump type:%d\n", args->type);
> +
> +	if (!args->objects || !args->objects_size)
> +		return -EINVAL;
> +
> +	mutex_lock(&p->mutex);
> +
> +	if (!kfd_has_process_device_data(p)) {
> +		pr_err("No pdd for given process\n");
> +		ret = -ENODEV;
> +		goto err_unlock;
> +	}
> +
> +	switch (args->type) {
> +	case KFD_CRIU_OBJECT_TYPE_PROCESS:
> +		ret = criu_dump_process(p, args);
> +		break;
> +	case KFD_CRIU_OBJECT_TYPE_BO:
> +		ret = criu_dump_bos(p, args);
> +		break;
> +	case KFD_CRIU_OBJECT_TYPE_QUEUE:
> +	case KFD_CRIU_OBJECT_TYPE_EVENT:
> +	case KFD_CRIU_OBJECT_TYPE_DEVICE:
> +	case KFD_CRIU_OBJECT_TYPE_SVM_RANGE:
> +	default:
> +		pr_err("Unsupported object type:%d\n", args->type);
> +		ret = -EINVAL;
> +	}
> +
> +err_unlock:
> +	mutex_unlock(&p->mutex);
> +	if (ret)
> +		pr_err("Failed to dump CRIU type:%d ret:%d\n", args->type, ret);
> +	else
> +		pr_debug("CRIU dump type:%d ret:%d\n", args->type, ret);
> +
> +	return ret;
>  }
>  
>  static int kfd_ioctl_criu_restorer(struct file *filep,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 4e390006b4b6..8c9f2b3ac85d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -1031,7 +1031,8 @@ struct kfd_criu_device_priv_data {
>  };
>  
>  struct kfd_criu_bo_priv_data {
> -	uint64_t reserved;
> +	uint64_t user_addr;
> +	uint32_t idr_handle;
>  };
>  
>  struct kfd_criu_svm_range_priv_data {

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs
  2021-08-19 13:36 ` [PATCH 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs David Yat Sin
@ 2021-08-23 18:57   ` Felix Kuehling
  0 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2021-08-23 18:57 UTC (permalink / raw)
  To: David Yat Sin, amd-gfx; +Cc: rajneesh.bhardwaj


Am 2021-08-19 um 9:36 a.m. schrieb David Yat Sin:
> From: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
>
> Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
> snapshot a running process and later restore it on same or a remote
> machine but expects the processes that have a device file (e.g. GPU)
> associated with them, provide necessary driver support to assist CRIU
> and its extensible plugin interface. Thus, In order to support the
> Checkpoint-Restore of any ROCm process, the AMD Radeon Open Compute
> Kernel driver, needs to provide a set of new APIs that provide
> necessary VRAM metadata and its contents to a userspace component
> (CRIU plugin) that can store it in form of image files.
>
> This introduces some new ioctls which will be used to checkpoint-Restore
> any KFD bound user process. KFD doesn't allow any arbitrary ioctl call
> unless it is called by the group leader process. Since these ioctls are
> expected to be called from a KFD criu plugin which has elevated ptrace
> attached privileges and CAP_SYS_ADMIN capabilities attached with the file
> descriptors so modify KFD to allow such calls.
>
> (API redesign suggested by Felix Kuehling and implemented by David Yat
> Sin)
>
> Signed-off-by: David Yat Sin <david.yatsin@amd.com>
> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
> (cherry picked from commit 72f4907135aed9c037b9f442a6055b51733b518a)
> (cherry picked from commit 33ff4953c5352f51d57a77ba8ae6614b7993e70d)
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c |  70 ++++++++++++++-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  69 ++++++++++++++
>  include/uapi/linux/kfd_ioctl.h           | 110 ++++++++++++++++++++++-
>  3 files changed, 247 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index 059c3f1ca27d..a1b60d29aae1 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -33,6 +33,7 @@
>  #include <linux/time.h>
>  #include <linux/mm.h>
>  #include <linux/mman.h>
> +#include <linux/ptrace.h>
>  #include <linux/dma-buf.h>
>  #include <asm/processor.h>
>  #include "kfd_priv.h"
> @@ -1802,6 +1803,44 @@ static int kfd_ioctl_svm(struct file *filep, struct kfd_process *p, void *data)
>  	return -EPERM;
>  }
>  #endif
> +static int kfd_ioctl_criu_dumper(struct file *filep,
> +				struct kfd_process *p, void *data)
> +{
> +	pr_debug("Inside %s\n", __func__);
> +
> +	return 0;
> +}
> +
> +static int kfd_ioctl_criu_restorer(struct file *filep,
> +				struct kfd_process *p, void *data)
> +{
> +	pr_debug("Inside %s\n", __func__);
> +
> +	return 0;
> +}
> +
> +static int kfd_ioctl_criu_pause(struct file *filep, struct kfd_process *p, void *data)
> +{
> +	pr_debug("Inside %s\n", __func__);
> +
> +	return 0;
> +}
> +
> +static int kfd_ioctl_criu_resume(struct file *filep,
> +				struct kfd_process *p, void *data)
> +{
> +	pr_debug("Inside %s\n", __func__);
> +
> +	return 0;
> +}
> +
> +static int kfd_ioctl_criu_process_info(struct file *filep,
> +				struct kfd_process *p, void *data)
> +{
> +	pr_debug("Inside %s\n", __func__);
> +
> +	return 0;
> +}
>  
>  #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
>  	[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
> @@ -1906,6 +1945,21 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
>  
>  	AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_XNACK_MODE,
>  			kfd_ioctl_set_xnack_mode, 0),
> +
> +	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_DUMPER,
> +			 kfd_ioctl_criu_dumper, KFD_IOC_FLAG_PTRACE_ATTACHED),
> +
> +	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_RESTORER,
> +			 kfd_ioctl_criu_restorer, KFD_IOC_FLAG_ROOT_ONLY),
> +
> +	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_PROCESS_INFO,
> +			 kfd_ioctl_criu_process_info, KFD_IOC_FLAG_PTRACE_ATTACHED),
> +
> +	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_RESUME,
> +			 kfd_ioctl_criu_resume, KFD_IOC_FLAG_ROOT_ONLY),
> +
> +	AMDKFD_IOCTL_DEF(AMDKFD_IOC_CRIU_PAUSE,
> +			 kfd_ioctl_criu_pause, KFD_IOC_FLAG_PTRACE_ATTACHED),
>  };
>  
>  #define AMDKFD_CORE_IOCTL_COUNT	ARRAY_SIZE(amdkfd_ioctls)
> @@ -1920,6 +1974,7 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  	char *kdata = NULL;
>  	unsigned int usize, asize;
>  	int retcode = -EINVAL;
> +	bool ptrace_attached = false;
>  
>  	if (nr >= AMDKFD_CORE_IOCTL_COUNT)
>  		goto err_i1;
> @@ -1945,7 +2000,15 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  	 * processes need to create their own KFD device context.
>  	 */
>  	process = filep->private_data;
> -	if (process->lead_thread != current->group_leader) {
> +
> +	rcu_read_lock();
> +	if ((ioctl->flags & KFD_IOC_FLAG_PTRACE_ATTACHED) &&
> +	    ptrace_parent(process->lead_thread) == current)
> +		ptrace_attached = true;
> +	rcu_read_unlock();
> +
> +	if (process->lead_thread != current->group_leader
> +	    && !ptrace_attached) {
>  		dev_dbg(kfd_device, "Using KFD FD in wrong process\n");
>  		retcode = -EBADF;
>  		goto err_i1;
> @@ -1960,6 +2023,11 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  		goto err_i1;
>  	}
>  
> +	/* KFD_IOC_FLAG_ROOT_ONLY is only for CAP_SYS_ADMIN */
> +	if (unlikely((ioctl->flags & KFD_IOC_FLAG_ROOT_ONLY) &&
> +		     !capable(CAP_SYS_ADMIN)))
> +		return -EACCES;
> +
>  	if (cmd & (IOC_IN | IOC_OUT)) {
>  		if (asize <= sizeof(stack_kdata)) {
>  			kdata = stack_kdata;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 64552f6b8ba4..768cc3fe95d2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -121,7 +121,35 @@
>   */
>  #define KFD_QUEUE_DOORBELL_MIRROR_OFFSET 512
>  
> +/**
> + * enum kfd_ioctl_flags - KFD ioctl flags
> + * Various flags that can be set in &amdkfd_ioctl_desc.flags to control how
> + * userspace can use a given ioctl.
> + */
> +enum kfd_ioctl_flags {
> +	/**
> +	 * @KFD_IOC_FLAG_ROOT_ONLY:
> +	 * Certain KFD ioctls such as AMDKFD_IOC_CRIU_RESTORER can potentially
> +	 * perform privileged operations and load arbitrary data into MQDs and
> +	 * eventually HQD registers when the queue is mapped by HWS. In order to
> +	 * prevent this we should perform additional security checks. In other
> +	 * cases, certain ioctls such as AMDKFD_IOC_CRIU_RESUME might be called
> +	 * by an external process e.g. CRIU restore process, for each resuming
> +	 * tasks and thus require elevated privileges.
> +	 *
> +	 * This is equivalent to callers with the SYSADMIN capability.
> +	 */
> +	KFD_IOC_FLAG_ROOT_ONLY = BIT(0),
> +	/**
> +	 * @KFD_IOC_FLAG_PTRACE_ATTACHED:
> +	 * Certain KFD ioctls such as AMDKFD_IOC_CRIU_HELPER and
> +	 * AMDKFD_IOC_CRIU_DUMPER are expected to be called during a Checkpoint
> +	 * operation triggered by CRIU. Since, these are expected to be called
> +	 * from a PTRACE attached context, we must authenticate these.
> +	 */
> +	KFD_IOC_FLAG_PTRACE_ATTACHED = BIT(1),
>  
> +};
>  /*
>   * Kernel module parameter to specify maximum number of supported queues per
>   * device
> @@ -977,6 +1005,47 @@ void kfd_process_set_trap_handler(struct qcm_process_device *qpd,
>  				  uint64_t tba_addr,
>  				  uint64_t tma_addr);
>  
> +/* CRIU */
> +/*
> + * Need to increment KFD_CRIU_PRIV_VERSION each time a change is made to any of the CRIU private
> + * structures:
> + * kfd_criu_process_priv_data
> + * kfd_criu_device_priv_data
> + * kfd_criu_bo_priv_data
> + * kfd_criu_queue_priv_data
> + * kfd_criu_event_priv_data
> + * kfd_criu_svm_range_priv_data
> + */
> +
> +#define KFD_CRIU_PRIV_VERSION 1
> +
> +struct kfd_criu_process_priv_data {
> +	uint32_t version;
> +};
> +
> +struct kfd_criu_device_priv_data {
> +	/* For future use */
> +	uint64_t reserved;
> +};
> +
> +struct kfd_criu_bo_priv_data {
> +	uint64_t reserved;
> +};
> +
> +struct kfd_criu_svm_range_priv_data {
> +	uint64_t reserved;
> +};
> +
> +struct kfd_criu_queue_priv_data {
> +	uint64_t reserved;
> +};
> +
> +struct kfd_criu_event_priv_data {
> +	uint64_t reserved;
> +};
> +
> +/* CRIU - End */
> +
>  /* Queue Context Management */
>  int init_queue(struct queue **q, const struct queue_properties *properties);
>  void uninit_queue(struct queue *q);
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index 3cb5b5dd9f77..19489e2ca58e 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -467,6 +467,99 @@ struct kfd_ioctl_smi_events_args {
>  	__u32 anon_fd;	/* from KFD */
>  };
>  
> +struct kfd_criu_process_bucket {
> +	__u64 priv_data_offset;
> +	__u64 priv_data_size;
> +};
> +
> +struct kfd_criu_device_bucket {
> +	__u64 priv_data_offset;
> +	__u64 priv_data_size;
> +	__u32 user_gpu_id;
> +	__u32 actual_gpu_id;
> +	__u32 drm_fd;
> +	__u32 pad;
> +};
> +
> +struct kfd_criu_bo_bucket {
> +	__u64 priv_data_offset;
> +	__u64 priv_data_size;
> +	__u64 addr;
> +	__u64 size;
> +	__u64 offset;
> +	__u64 restored_offset;
> +	__u32 gpu_id;
> +	__u32 alloc_flags;
> +	__u32 dmabuf_fd;
> +	__u32 pad;
> +};
> +
> +struct kfd_criu_queue_bucket {
> +	__u64 priv_data_offset;
> +	__u64 priv_data_size;
> +	__u32 gpu_id;
> +	__u32 pad;
> +};
> +
> +struct kfd_criu_event_bucket {
> +	__u64 priv_data_offset;
> +	__u64 priv_data_size;
> +	__u32 gpu_id;
> +	__u32 pad;
> +};
> +
> +struct kfd_ioctl_criu_process_info_args {
> +	__u64 process_priv_data_size;
> +	__u64 bos_priv_data_size;
> +	__u64 devices_priv_data_size;
> +	__u64 queues_priv_data_size;
> +	__u64 events_priv_data_size;
> +	__u64 svm_ranges_priv_data_size;
> +	__u64 total_bos;
> +	__u64 total_svm_ranges;
> +	__u32 total_devices;
> +	__u32 total_queues;
> +	__u32 total_events;
> +	__u32 task_pid;
> +};
> +
> +struct kfd_ioctl_criu_pause_args {
> +	__u32 pause;
> +	__u32 pad;
> +};
> +
> +enum kfd_criu_object_type {
> +	KFD_CRIU_OBJECT_TYPE_PROCESS	= 0,
> +	KFD_CRIU_OBJECT_TYPE_DEVICE	= 1,
> +	KFD_CRIU_OBJECT_TYPE_BO		= 2,
> +	KFD_CRIU_OBJECT_TYPE_QUEUE	= 3,
> +	KFD_CRIU_OBJECT_TYPE_EVENT	= 4,
> +	KFD_CRIU_OBJECT_TYPE_SVM_RANGE	= 5,
> +};
> +

Please add comments explaining the members of the ioctl args structures.
E.g. it's not obvious that objects is a user mode pointer, or the
semantics of the objects_index_start field.

Regards,
  Felix


> +struct kfd_ioctl_criu_dumper_args {
> +	__u64 num_objects;
> +	__u64 objects;
> +	__u64 objects_size;
> +	__u64 objects_index_start;
> +	__u32 type; /* enum kfd_criu_object_type */
> +	__u32 pad;
> +};
> +
> +struct kfd_ioctl_criu_restorer_args {
> +	__u64 num_objects;
> +	__u64 objects;
> +	__u64 objects_size;
> +	__u64 objects_index_start;
> +	__u32 type; /* enum kfd_criu_object_type */
> +	__u32 pad;
> +};
> +
> +struct kfd_ioctl_criu_resume_args {
> +	__u32 pid;	/* to KFD */
> +	__u32 pad;
> +};
> +
>  /* Register offset inside the remapped mmio page
>   */
>  enum kfd_mmio_remap {
> @@ -740,7 +833,22 @@ struct kfd_ioctl_set_xnack_mode_args {
>  #define AMDKFD_IOC_SET_XNACK_MODE		\
>  		AMDKFD_IOWR(0x21, struct kfd_ioctl_set_xnack_mode_args)
>  
> +#define AMDKFD_IOC_CRIU_DUMPER			\
> +		AMDKFD_IOWR(0x22, struct kfd_ioctl_criu_dumper_args)
> +
> +#define AMDKFD_IOC_CRIU_RESTORER		\
> +		AMDKFD_IOWR(0x23, struct kfd_ioctl_criu_restorer_args)
> +
> +#define AMDKFD_IOC_CRIU_PROCESS_INFO		\
> +		AMDKFD_IOWR(0x24, struct kfd_ioctl_criu_process_info_args)
> +
> +#define AMDKFD_IOC_CRIU_RESUME			\
> +		AMDKFD_IOWR(0x25, struct kfd_ioctl_criu_resume_args)
> +
> +#define AMDKFD_IOC_CRIU_PAUSE			\
> +		AMDKFD_IOWR(0x26, struct kfd_ioctl_criu_pause_args)
> +
>  #define AMDKFD_COMMAND_START		0x01
> -#define AMDKFD_COMMAND_END		0x22
> +#define AMDKFD_COMMAND_END		0x27
>  
>  #endif

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2021-08-23 20:26 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-19 13:36 [PATCH 00/18] CHECKPOINT RESTORE WITH ROCm David Yat Sin
2021-08-19 13:36 ` [PATCH 01/18] x86/configs: CRIU update release defconfig David Yat Sin
2021-08-19 13:36 ` [PATCH 02/18] x86/configs: CRIU update debug rock defconfig David Yat Sin
2021-08-19 13:36 ` [PATCH 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs David Yat Sin
2021-08-23 18:57   ` Felix Kuehling
2021-08-19 13:36 ` [PATCH 04/18] drm/amdkfd: CRIU Implement KFD process_info ioctl David Yat Sin
2021-08-19 13:37 ` [PATCH 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl David Yat Sin
2021-08-23 18:53   ` Felix Kuehling
2021-08-19 13:37 ` [PATCH 06/18] drm/amdkfd: CRIU Implement KFD restore ioctl David Yat Sin
2021-08-19 13:37 ` [PATCH 07/18] drm/amdkfd: CRIU Implement KFD resume ioctl David Yat Sin
2021-08-19 13:37 ` [PATCH 08/18] drm/amdkfd: CRIU Implement KFD pause ioctl David Yat Sin
2021-08-19 13:37 ` [PATCH 09/18] drm/amdkfd: CRIU add queues support David Yat Sin
2021-08-23 18:29   ` Felix Kuehling
2021-08-19 13:37 ` [PATCH 10/18] drm/amdkfd: CRIU restore queue ids David Yat Sin
2021-08-23 18:29   ` Felix Kuehling
2021-08-19 13:37 ` [PATCH 11/18] drm/amdkfd: CRIU restore sdma id for queues David Yat Sin
2021-08-19 13:37 ` [PATCH 12/18] drm/amdkfd: CRIU restore queue doorbell id David Yat Sin
2021-08-19 13:37 ` [PATCH 13/18] drm/amdkfd: CRIU dump and restore queue mqds David Yat Sin
2021-08-19 13:37 ` [PATCH 14/18] drm/amdkfd: CRIU dump/restore queue control stack David Yat Sin
2021-08-19 13:37 ` [PATCH 15/18] drm/amdkfd: CRIU dump and restore events David Yat Sin
2021-08-23 18:39   ` Felix Kuehling
2021-08-19 13:37 ` [PATCH 16/18] drm/amdkfd: CRIU implement gpu_id remapping David Yat Sin
2021-08-23 18:48   ` Felix Kuehling
2021-08-19 13:37 ` [PATCH 17/18] Revert "drm/amdgpu: Remove verify_access shortcut for KFD BOs" David Yat Sin
2021-08-19 13:37 ` [PATCH 18/18] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects David Yat Sin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.