amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] RFC Support hot device unplug in amdgpu
@ 2020-06-21  6:03 Andrey Grodzovsky
  2020-06-21  6:03 ` [PATCH v2 1/8] drm: Add dummy page per device or GEM object Andrey Grodzovsky
                   ` (8 more replies)
  0 siblings, 9 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-21  6:03 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Andrey Grodzovsky, daniel.vetter, michel, ppaalanen,
	ckoenig.leichtzumerken, alexdeucher

This RFC is more of a proof of concept then a fully working solution as there are a few unresolved issues we are hoping to get advise on from people on the mailing list.
Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove)
would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still
trying to access the BO while the backing device was gone.
To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers
the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handler
if the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate.
This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUS
it would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence.
This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated.
With this I was able to cleanly emulate device unplug with X and glxgears running and later emulate device plug back and restart of X and glxgears.

v2:
Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS
and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last
reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted
to per user process dummy rw page.
Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requiremnts that still give useful solution to work and this is the
'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
 
This iteration is still more of a draft as I am still facing a few unsolved issues such as a crash in user client when trying to CPU map imported BO if the map happens after device was
removed and HW failure to plug back a removed device. Also since i don't have real life setup with external GPU connected through TB I am using sysfs to emulate pci remove and i
expect to encounter more issues once i try this on real life case. I am also expecting some help on this from a user who volunteered to test in the related gitlab ticket.
So basically this is more of a  way to get feedback if I am moving in the right direction.

[1] - Discussions during v1 of the patchset https://lists.freedesktop.org/archives/dri-devel/2020-May/265386.html
[2] - drm/doc: device hot-unplug for userspace https://www.spinics.net/lists/dri-devel/msg259755.html
[3] - Related gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081
 

Andrey Grodzovsky (8):
  drm: Add dummy page per device or GEM object
  drm/ttm: Remap all page faults to per process dummy page.
  drm/ttm: Add unampping of the entire device address space
  drm/amdgpu: Split amdgpu_device_fini into early and late
  drm/amdgpu: Refactor sysfs removal
  drm/amdgpu: Unmap entire device address space on device remove.
  drm/amdgpu: Fix sdma code crash post device unplug
  drm/amdgpu: Prevent any job recoveries after device is unplugged.

 drivers/gpu/drm/amd/amdgpu/amdgpu.h          | 19 +++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c |  7 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   | 50 +++++++++++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c      | 23 ++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 12 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c      | 24 ++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h      |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c      |  8 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c      | 23 +++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c      |  8 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c      |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c  | 21 ++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c     | 13 +++++-
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c         | 10 +++--
 drivers/gpu/drm/drm_file.c                   |  8 ++++
 drivers/gpu/drm/drm_prime.c                  | 10 +++++
 drivers/gpu/drm/ttm/ttm_bo.c                 |  8 +++-
 drivers/gpu/drm/ttm/ttm_bo_vm.c              | 65 ++++++++++++++++++++++++----
 include/drm/drm_file.h                       |  2 +
 include/drm/drm_gem.h                        |  2 +
 include/drm/ttm/ttm_bo_driver.h              |  7 +++
 22 files changed, 286 insertions(+), 55 deletions(-)

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
@ 2020-06-21  6:03 ` Andrey Grodzovsky
  2020-06-22  9:35   ` Daniel Vetter
  2020-06-22 13:18   ` Christian König
  2020-06-21  6:03 ` [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-21  6:03 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Andrey Grodzovsky, daniel.vetter, michel, ppaalanen,
	ckoenig.leichtzumerken, alexdeucher

Will be used to reroute CPU mapped BO's page faults once
device is removed.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/drm_file.c  |  8 ++++++++
 drivers/gpu/drm/drm_prime.c | 10 ++++++++++
 include/drm/drm_file.h      |  2 ++
 include/drm/drm_gem.h       |  2 ++
 4 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index c4c704e..67c0770 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
 			goto out_prime_destroy;
 	}
 
+	file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!file->dummy_page) {
+		ret = -ENOMEM;
+		goto out_prime_destroy;
+	}
+
 	return file;
 
 out_prime_destroy:
@@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
 	if (dev->driver->postclose)
 		dev->driver->postclose(dev, file);
 
+	__free_page(file->dummy_page);
+
 	drm_prime_destroy_file_private(&file->prime);
 
 	WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 1de2cde..c482e9c 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
 
 	ret = drm_prime_add_buf_handle(&file_priv->prime,
 			dma_buf, *handle);
+
+	if (!ret) {
+		obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!obj->dummy_page)
+			ret = -ENOMEM;
+	}
+
 	mutex_unlock(&file_priv->prime.lock);
 	if (ret)
 		goto fail;
@@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
 		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
 	dma_buf = attach->dmabuf;
 	dma_buf_detach(attach->dmabuf, attach);
+
+	__free_page(obj->dummy_page);
+
 	/* remove the reference */
 	dma_buf_put(dma_buf);
 }
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 19df802..349a658 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -335,6 +335,8 @@ struct drm_file {
 	 */
 	struct drm_prime_file_private prime;
 
+	struct page *dummy_page;
+
 	/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
 	unsigned long lock_count; /* DRI1 legacy lock count */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 0b37506..47460d1 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -310,6 +310,8 @@ struct drm_gem_object {
 	 *
 	 */
 	const struct drm_gem_object_funcs *funcs;
+
+	struct page *dummy_page;
 };
 
 /**
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page.
  2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
  2020-06-21  6:03 ` [PATCH v2 1/8] drm: Add dummy page per device or GEM object Andrey Grodzovsky
@ 2020-06-21  6:03 ` Andrey Grodzovsky
  2020-06-22  9:41   ` Daniel Vetter
  2020-06-22 19:30   ` Christian König
  2020-06-21  6:03 ` [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space Andrey Grodzovsky
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-21  6:03 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Andrey Grodzovsky, daniel.vetter, michel, ppaalanen,
	ckoenig.leichtzumerken, alexdeucher

On device removal reroute all CPU mappings to dummy page per drm_file
instance or imported GEM object.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 57 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 389128b..2f8bf5e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -35,6 +35,8 @@
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
 #include <drm/drm_vma_manager.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
 #include <linux/mm.h>
 #include <linux/pfn_t.h>
 #include <linux/rbtree.h>
@@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 	pgprot_t prot;
 	struct ttm_buffer_object *bo = vma->vm_private_data;
 	vm_fault_t ret;
+	int idx;
+	struct drm_device *ddev = bo->base.dev;
 
-	ret = ttm_bo_vm_reserve(bo, vmf);
-	if (ret)
-		return ret;
+	if (drm_dev_enter(ddev, &idx)) {
+		ret = ttm_bo_vm_reserve(bo, vmf);
+		if (ret)
+			goto exit;
+
+		prot = vma->vm_page_prot;
 
-	prot = vma->vm_page_prot;
-	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
-	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
+		ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
+		if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
+			goto exit;
+
+		dma_resv_unlock(bo->base.resv);
+
+exit:
+		drm_dev_exit(idx);
 		return ret;
+	} else {
 
-	dma_resv_unlock(bo->base.resv);
+		struct drm_file *file = NULL;
+		struct page *dummy_page = NULL;
+		int handle;
 
-	return ret;
+		/* We are faulting on imported BO from dma_buf */
+		if (bo->base.dma_buf && bo->base.import_attach) {
+			dummy_page = bo->base.dummy_page;
+		/* We are faulting on non imported BO, find drm_file owning the BO*/
+		} else {
+			struct drm_gem_object *gobj;
+
+			mutex_lock(&ddev->filelist_mutex);
+			list_for_each_entry(file, &ddev->filelist, lhead) {
+				spin_lock(&file->table_lock);
+				idr_for_each_entry(&file->object_idr, gobj, handle) {
+					if (gobj == &bo->base) {
+						dummy_page = file->dummy_page;
+						break;
+					}
+				}
+				spin_unlock(&file->table_lock);
+			}
+			mutex_unlock(&ddev->filelist_mutex);
+		}
+
+		if (dummy_page) {
+			/*
+			 * Let do_fault complete the PTE install e.t.c using vmf->page
+			 *
+			 * TODO - should i call free_page somewhere ?
+			 */
+			get_page(dummy_page);
+			vmf->page = dummy_page;
+			return 0;
+		} else {
+			return VM_FAULT_SIGSEGV;
+		}
+	}
 }
 EXPORT_SYMBOL(ttm_bo_vm_fault);
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space
  2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
  2020-06-21  6:03 ` [PATCH v2 1/8] drm: Add dummy page per device or GEM object Andrey Grodzovsky
  2020-06-21  6:03 ` [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky
@ 2020-06-21  6:03 ` Andrey Grodzovsky
  2020-06-22  9:45   ` Daniel Vetter
                     ` (2 more replies)
  2020-06-21  6:03 ` [PATCH v2 4/8] drm/amdgpu: Split amdgpu_device_fini into early and late Andrey Grodzovsky
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-21  6:03 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Andrey Grodzovsky, daniel.vetter, michel, ppaalanen,
	ckoenig.leichtzumerken, alexdeucher

Helper function to be used to invalidate all BOs CPU mappings
once device is removed.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c    | 8 ++++++--
 include/drm/ttm/ttm_bo_driver.h | 7 +++++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index c5b516f..926a365 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)
 	ttm_bo_unmap_virtual_locked(bo);
 	ttm_mem_io_unlock(man);
 }
-
-
 EXPORT_SYMBOL(ttm_bo_unmap_virtual);
 
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev)
+{
+	unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
+}
+EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
+
 int ttm_bo_wait(struct ttm_buffer_object *bo,
 		bool interruptible, bool no_wait)
 {
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index c9e0fd0..39ea44f 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
 void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
 
 /**
+ * ttm_bo_unmap_virtual_address_space
+ *
+ * @bdev: tear down all the virtual mappings for this device
+ */
+void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
+
+/**
  * ttm_bo_unmap_virtual
  *
  * @bo: tear down the virtual mappings for this BO
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v2 4/8] drm/amdgpu: Split amdgpu_device_fini into early and late
  2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
                   ` (2 preceding siblings ...)
  2020-06-21  6:03 ` [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space Andrey Grodzovsky
@ 2020-06-21  6:03 ` Andrey Grodzovsky
  2020-06-22  9:48   ` Daniel Vetter
  2020-06-21  6:03 ` [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal Andrey Grodzovsky
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-21  6:03 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Andrey Grodzovsky, daniel.vetter, michel, ppaalanen,
	ckoenig.leichtzumerken, alexdeucher

Some of the stuff in amdgpu_device_fini such as HW interrupts
disable and pending fences finilization must be done right away on
pci_remove while most of the stuff which relates to finilizing and releasing
driver data structures can be kept until drm_driver.release hook is called, i.e.
when the last device reference is dropped.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  6 ++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 23 +++++++++++++++++------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
 7 files changed, 54 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 2a806cb..604a681 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1003,7 +1003,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 		       struct drm_device *ddev,
 		       struct pci_dev *pdev,
 		       uint32_t flags);
-void amdgpu_device_fini(struct amdgpu_device *adev);
+void amdgpu_device_fini_early(struct amdgpu_device *adev);
+void amdgpu_device_fini_late(struct amdgpu_device *adev);
+
 int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
 
 void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
@@ -1188,6 +1190,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
 int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
 void amdgpu_driver_postclose_kms(struct drm_device *dev,
 				 struct drm_file *file_priv);
+void amdgpu_driver_release_kms(struct drm_device *dev);
+
 int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
 int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
 int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index cc41e8f..e7b9065 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2309,6 +2309,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
 {
 	int i, r;
 
+	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
+
 	amdgpu_ras_pre_fini(adev);
 
 	if (adev->gmc.xgmi.num_physical_nodes > 1)
@@ -3304,10 +3306,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  * Tear down the driver info (all asics).
  * Called at driver shutdown.
  */
-void amdgpu_device_fini(struct amdgpu_device *adev)
+void amdgpu_device_fini_early(struct amdgpu_device *adev)
 {
-	int r;
-
 	DRM_INFO("amdgpu: finishing device.\n");
 	flush_delayed_work(&adev->delayed_init_work);
 	adev->shutdown = true;
@@ -3330,7 +3330,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
 	if (adev->pm_sysfs_en)
 		amdgpu_pm_sysfs_fini(adev);
 	amdgpu_fbdev_fini(adev);
-	r = amdgpu_device_ip_fini(adev);
+
+	amdgpu_irq_fini_early(adev);
+}
+
+void amdgpu_device_fini_late(struct amdgpu_device *adev)
+{
+	amdgpu_device_ip_fini(adev);
 	if (adev->firmware.gpu_info_fw) {
 		release_firmware(adev->firmware.gpu_info_fw);
 		adev->firmware.gpu_info_fw = NULL;
@@ -3368,6 +3374,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
 		amdgpu_pmu_fini(adev);
 	if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
 		amdgpu_discovery_fini(adev);
+
 }
 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 9e5afa5..43592dc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1134,12 +1134,9 @@ amdgpu_pci_remove(struct pci_dev *pdev)
 {
 	struct drm_device *dev = pci_get_drvdata(pdev);
 
-#ifdef MODULE
-	if (THIS_MODULE->state != MODULE_STATE_GOING)
-#endif
-		DRM_ERROR("Hotplug removal is not supported\n");
 	drm_dev_unplug(dev);
 	amdgpu_driver_unload_kms(dev);
+
 	pci_disable_device(pdev);
 	pci_set_drvdata(pdev, NULL);
 	drm_dev_put(dev);
@@ -1445,6 +1442,7 @@ static struct drm_driver kms_driver = {
 	.dumb_create = amdgpu_mode_dumb_create,
 	.dumb_map_offset = amdgpu_mode_dumb_mmap,
 	.fops = &amdgpu_driver_kms_fops,
+	.release = &amdgpu_driver_release_kms,
 
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 0cc4c67..1697655 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -49,6 +49,7 @@
 #include <drm/drm_irq.h>
 #include <drm/drm_vblank.h>
 #include <drm/amdgpu_drm.h>
+#include <drm/drm_drv.h>
 #include "amdgpu.h"
 #include "amdgpu_ih.h"
 #include "atom.h"
@@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 	return 0;
 }
 
+
+void amdgpu_irq_fini_early(struct amdgpu_device *adev)
+{
+	if (adev->irq.installed) {
+		drm_irq_uninstall(adev->ddev);
+		adev->irq.installed = false;
+		if (adev->irq.msi_enabled)
+			pci_free_irq_vectors(adev->pdev);
+
+		if (!amdgpu_device_has_dc_support(adev))
+			flush_work(&adev->hotplug_work);
+	}
+}
+
 /**
  * amdgpu_irq_fini - shut down interrupt handling
  *
@@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
 {
 	unsigned i, j;
 
-	if (adev->irq.installed) {
-		drm_irq_uninstall(adev->ddev);
-		adev->irq.installed = false;
-		if (adev->irq.msi_enabled)
-			pci_free_irq_vectors(adev->pdev);
-		if (!amdgpu_device_has_dc_support(adev))
-			flush_work(&adev->hotplug_work);
-	}
-
 	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
 		if (!adev->irq.client[i].sources)
 			continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
index c718e94..718c70f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
@@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
 
 int amdgpu_irq_init(struct amdgpu_device *adev);
 void amdgpu_irq_fini(struct amdgpu_device *adev);
+void amdgpu_irq_fini_early(struct amdgpu_device *adev);
 int amdgpu_irq_add_id(struct amdgpu_device *adev,
 		      unsigned client_id, unsigned src_id,
 		      struct amdgpu_irq_src *source);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index c0b1904..9d0af22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -29,6 +29,7 @@
 #include "amdgpu.h"
 #include <drm/drm_debugfs.h>
 #include <drm/amdgpu_drm.h>
+#include <drm/drm_drv.h>
 #include "amdgpu_sched.h"
 #include "amdgpu_uvd.h"
 #include "amdgpu_vce.h"
@@ -86,7 +87,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
 	amdgpu_unregister_gpu_instance(adev);
 
 	if (adev->rmmio == NULL)
-		goto done_free;
+		return;
 
 	if (adev->runpm) {
 		pm_runtime_get_sync(dev->dev);
@@ -95,11 +96,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
 
 	amdgpu_acpi_fini(adev);
 
-	amdgpu_device_fini(adev);
-
-done_free:
-	kfree(adev);
-	dev->dev_private = NULL;
+	amdgpu_device_fini_early(adev);
 }
 
 void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
@@ -1108,6 +1105,20 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	pm_runtime_put_autosuspend(dev->dev);
 }
 
+
+void amdgpu_driver_release_kms (struct drm_device *dev)
+{
+	struct amdgpu_device *adev = dev->dev_private;
+
+	amdgpu_device_fini_late(adev);
+
+	kfree(adev);
+	dev->dev_private = NULL;
+
+	drm_dev_fini(dev);
+	kfree(dev);
+}
+
 /*
  * VBlank related functions.
  */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 7348619..169c2239 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2056,9 +2056,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
 {
 	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
 
+	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
+
 	if (!con)
 		return 0;
 
+
 	/* Need disable ras on all IPs here before ip [hw/sw]fini */
 	amdgpu_ras_disable_all_features(adev, 0);
 	amdgpu_ras_recovery_fini(adev);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
                   ` (3 preceding siblings ...)
  2020-06-21  6:03 ` [PATCH v2 4/8] drm/amdgpu: Split amdgpu_device_fini into early and late Andrey Grodzovsky
@ 2020-06-21  6:03 ` Andrey Grodzovsky
  2020-06-22  9:51   ` Daniel Vetter
  2020-06-22 13:19   ` Christian König
  2020-06-21  6:03 ` [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove Andrey Grodzovsky
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-21  6:03 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Andrey Grodzovsky, daniel.vetter, michel, ppaalanen,
	ckoenig.leichtzumerken, alexdeucher

Track sysfs files in a list so they all can be removed during pci remove
since otherwise their removal after that causes crash because parent
folder was already removed during pci remove.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h          | 13 +++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c |  7 +++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   | 35 ++++++++++++++++++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 12 ++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c      |  8 ++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c     | 13 ++++++++++-
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c         | 10 +++++---
 8 files changed, 99 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 604a681..ba3775f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -726,6 +726,15 @@ struct amd_powerplay {
 
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
+
+struct amdgpu_sysfs_list_node {
+	struct list_head head;
+	struct device_attribute *attr;
+};
+
+#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
+	struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr}
+
 struct amdgpu_device {
 	struct device			*dev;
 	struct drm_device		*ddev;
@@ -992,6 +1001,10 @@ struct amdgpu_device {
 	char				product_number[16];
 	char				product_name[32];
 	char				serial[16];
+
+	struct list_head sysfs_files_list;
+	struct mutex	 sysfs_files_list_lock;
+
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
index fdd52d8..c1549ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
@@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev,
 	return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version);
 }
 
+
 static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version,
 		   NULL);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
 
 /**
  * amdgpu_atombios_fini - free the driver info and callbacks for atombios
@@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev)
 	adev->mode_info.atom_context = NULL;
 	kfree(adev->mode_info.atom_card_info);
 	adev->mode_info.atom_card_info = NULL;
-	device_remove_file(adev->dev, &dev_attr_vbios_version);
 }
 
 /**
@@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev)
 		return ret;
 	}
 
+	mutex_lock(&adev->sysfs_files_list_lock);
+	list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list);
+	mutex_unlock(&adev->sysfs_files_list_lock);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index e7b9065..3173046 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = {
 	NULL
 };
 
+static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count);
+
+
 /**
  * amdgpu_device_init - initialize the driver
  *
@@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	INIT_LIST_HEAD(&adev->shadow_list);
 	mutex_init(&adev->shadow_list_lock);
 
+	INIT_LIST_HEAD(&adev->sysfs_files_list);
+	mutex_init(&adev->sysfs_files_list_lock);
+
 	INIT_DELAYED_WORK(&adev->delayed_init_work,
 			  amdgpu_device_delayed_init_work_handler);
 	INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
@@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	if (r) {
 		dev_err(adev->dev, "Could not create amdgpu device attr\n");
 		return r;
+	} else {
+		mutex_lock(&adev->sysfs_files_list_lock);
+		list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list);
+		list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list);
+		list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list);
+		list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list);
+		mutex_unlock(&adev->sysfs_files_list_lock);
 	}
 
 	if (IS_ENABLED(CONFIG_PERF_EVENTS))
@@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	return r;
 }
 
+static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev)
+{
+	struct amdgpu_sysfs_list_node *node;
+
+	mutex_lock(&adev->sysfs_files_list_lock);
+	list_for_each_entry(node, &adev->sysfs_files_list, head)
+		device_remove_file(adev->dev, node->attr);
+	mutex_unlock(&adev->sysfs_files_list_lock);
+}
+
 /**
  * amdgpu_device_fini - tear down the driver
  *
@@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev)
 	amdgpu_fbdev_fini(adev);
 
 	amdgpu_irq_fini_early(adev);
+
+	amdgpu_sysfs_remove_files(adev);
+
+	if (adev->ucode_sysfs_en)
+		amdgpu_ucode_sysfs_fini(adev);
 }
 
 void amdgpu_device_fini_late(struct amdgpu_device *adev)
@@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev)
 	adev->rmmio = NULL;
 	amdgpu_device_doorbell_fini(adev);
 
-	if (adev->ucode_sysfs_en)
-		amdgpu_ucode_sysfs_fini(adev);
-
-	sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
 	if (IS_ENABLED(CONFIG_PERF_EVENTS))
 		amdgpu_pmu_fini(adev);
 	if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 6271044..e7b6c4a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
 static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO,
 	           amdgpu_mem_info_gtt_used_show, NULL);
 
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used);
+
 /**
  * amdgpu_gtt_mgr_init - init GTT manager and DRM MM
  *
@@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
 		return ret;
 	}
 
+	mutex_lock(&adev->sysfs_files_list_lock);
+	list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list);
+	list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list);
+	mutex_unlock(&adev->sysfs_files_list_lock);
+
 	return 0;
 }
 
@@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
  */
 static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
 {
-	struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
 	struct amdgpu_gtt_mgr *mgr = man->priv;
 	spin_lock(&mgr->lock);
 	drm_mm_takedown(&mgr->mm);
@@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
 	kfree(mgr);
 	man->priv = NULL;
 
-	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
-	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index ddb4af0c..554fec0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR,
 		   psp_usbc_pd_fw_sysfs_read,
 		   psp_usbc_pd_fw_sysfs_write);
 
+static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw);
+
 
 
 const struct amd_ip_funcs psp_ip_funcs = {
@@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
 
 	if (ret)
 		DRM_ERROR("Failed to create USBC PD FW control file!");
+	else {
+		mutex_lock(&adev->sysfs_files_list_lock);
+		list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list);
+		mutex_unlock(&adev->sysfs_files_list_lock);
+	}
 
 	return ret;
 }
 
 static void psp_sysfs_fini(struct amdgpu_device *adev)
 {
-	device_remove_file(adev->dev, &dev_attr_usbc_pd_fw);
 }
 
 const struct amdgpu_ip_block_version psp_v3_1_ip_block =
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 7723937..39c400c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO,
 static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO,
 		   amdgpu_mem_info_vram_vendor, NULL);
 
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor);
+
 static const struct attribute *amdgpu_vram_mgr_attributes[] = {
 	&dev_attr_mem_info_vram_total.attr,
 	&dev_attr_mem_info_vis_vram_total.attr,
@@ -184,6 +190,15 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man,
 	ret = sysfs_create_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes);
 	if (ret)
 		DRM_ERROR("Failed to register sysfs\n");
+	else {
+		mutex_lock(&adev->sysfs_files_list_lock);
+		list_add_tail(&dev_attr_handle_mem_info_vram_total.head, &adev->sysfs_files_list);
+		list_add_tail(&dev_attr_handle_mem_info_vis_vram_total.head, &adev->sysfs_files_list);
+		list_add_tail(&dev_attr_handle_mem_info_vram_used.head, &adev->sysfs_files_list);
+		list_add_tail(&dev_attr_handle_mem_info_vis_vram_used.head, &adev->sysfs_files_list);
+		list_add_tail(&dev_attr_handle_mem_info_vram_vendor.head, &adev->sysfs_files_list);
+		mutex_unlock(&adev->sysfs_files_list_lock);
+	}
 
 	return 0;
 }
@@ -198,7 +213,6 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man,
  */
 static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man)
 {
-	struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
 	struct amdgpu_vram_mgr *mgr = man->priv;
 
 	spin_lock(&mgr->lock);
@@ -206,7 +220,6 @@ static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man)
 	spin_unlock(&mgr->lock);
 	kfree(mgr);
 	man->priv = NULL;
-	sysfs_remove_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 90610b4..455eaa4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -272,6 +272,9 @@ static ssize_t amdgpu_xgmi_show_error(struct device *dev,
 static DEVICE_ATTR(xgmi_device_id, S_IRUGO, amdgpu_xgmi_show_device_id, NULL);
 static DEVICE_ATTR(xgmi_error, S_IRUGO, amdgpu_xgmi_show_error, NULL);
 
+static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_device_id);
+static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_error);
+
 static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
 					 struct amdgpu_hive_info *hive)
 {
@@ -285,10 +288,19 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
 		return ret;
 	}
 
+	mutex_lock(&adev->sysfs_files_list_lock);
+	list_add_tail(&dev_attr_handle_xgmi_device_id.head, &adev->sysfs_files_list);
+	mutex_unlock(&adev->sysfs_files_list_lock);
+
 	/* Create xgmi error file */
 	ret = device_create_file(adev->dev, &dev_attr_xgmi_error);
 	if (ret)
 		pr_err("failed to create xgmi_error\n");
+	else {
+		mutex_lock(&adev->sysfs_files_list_lock);
+		list_add_tail(&dev_attr_handle_xgmi_error.head, &adev->sysfs_files_list);
+		mutex_unlock(&adev->sysfs_files_list_lock);
+	}
 
 
 	/* Create sysfs link to hive info folder on the first device */
@@ -325,7 +337,6 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
 static void amdgpu_xgmi_sysfs_rem_dev_info(struct amdgpu_device *adev,
 					  struct amdgpu_hive_info *hive)
 {
-	device_remove_file(adev->dev, &dev_attr_xgmi_device_id);
 	sysfs_remove_link(&adev->dev->kobj, adev->ddev->unique);
 	sysfs_remove_link(hive->kobj, adev->ddev->unique);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
index a7b8292..f95b0b2 100644
--- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
+++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
@@ -265,6 +265,8 @@ static ssize_t df_v3_6_get_df_cntr_avail(struct device *dev,
 /* device attr for available perfmon counters */
 static DEVICE_ATTR(df_cntr_avail, S_IRUGO, df_v3_6_get_df_cntr_avail, NULL);
 
+static AMDGPU_DEVICE_ATTR_LIST_NODE(df_cntr_avail);
+
 static void df_v3_6_query_hashes(struct amdgpu_device *adev)
 {
 	u32 tmp;
@@ -299,6 +301,11 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev)
 	ret = device_create_file(adev->dev, &dev_attr_df_cntr_avail);
 	if (ret)
 		DRM_ERROR("failed to create file for available df counters\n");
+	else {
+		mutex_lock(&adev->sysfs_files_list_lock);
+		list_add_tail(&dev_attr_handle_df_cntr_avail.head, &adev->sysfs_files_list);
+		mutex_unlock(&adev->sysfs_files_list_lock);
+	}
 
 	for (i = 0; i < AMDGPU_MAX_DF_PERFMONS; i++)
 		adev->df_perfmon_config_assign_mask[i] = 0;
@@ -308,9 +315,6 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev)
 
 static void df_v3_6_sw_fini(struct amdgpu_device *adev)
 {
-
-	device_remove_file(adev->dev, &dev_attr_df_cntr_avail);
-
 }
 
 static void df_v3_6_enable_broadcast_mode(struct amdgpu_device *adev,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove.
  2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
                   ` (4 preceding siblings ...)
  2020-06-21  6:03 ` [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal Andrey Grodzovsky
@ 2020-06-21  6:03 ` Andrey Grodzovsky
  2020-06-22  9:56   ` Daniel Vetter
  2020-06-22 19:38   ` Christian König
  2020-06-21  6:03 ` [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug Andrey Grodzovsky
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-21  6:03 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Andrey Grodzovsky, daniel.vetter, michel, ppaalanen,
	ckoenig.leichtzumerken, alexdeucher

Use the new TTM interface to invalidate all exsisting BO CPU mappings
form all user proccesses.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 43592dc..6932d75 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev)
 	struct drm_device *dev = pci_get_drvdata(pdev);
 
 	drm_dev_unplug(dev);
+	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
 	amdgpu_driver_unload_kms(dev);
 
 	pci_disable_device(pdev);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug
  2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
                   ` (5 preceding siblings ...)
  2020-06-21  6:03 ` [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove Andrey Grodzovsky
@ 2020-06-21  6:03 ` Andrey Grodzovsky
  2020-06-22  9:55   ` Daniel Vetter
  2020-06-22 19:40   ` Christian König
  2020-06-21  6:03 ` [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged Andrey Grodzovsky
  2020-06-22  9:46 ` [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Daniel Vetter
  8 siblings, 2 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-21  6:03 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Andrey Grodzovsky, daniel.vetter, michel, ppaalanen,
	ckoenig.leichtzumerken, alexdeucher

entity->rq becomes null aftre device unplugged so just return early
in that case.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
index 8d9c6fe..d252427 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
@@ -24,6 +24,7 @@
 #include "amdgpu_job.h"
 #include "amdgpu_object.h"
 #include "amdgpu_trace.h"
+#include <drm/drm_drv.h>
 
 #define AMDGPU_VM_SDMA_MIN_NUM_DW	256u
 #define AMDGPU_VM_SDMA_MAX_NUM_DW	(16u * 1024u)
@@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
 	struct drm_sched_entity *entity;
 	struct amdgpu_ring *ring;
 	struct dma_fence *f;
-	int r;
+	int r, idx;
+
+	if (!drm_dev_enter(p->adev->ddev, &idx)) {
+		r = -ENODEV;
+		goto nodev;
+	}
 
 	entity = p->immediate ? &p->vm->immediate : &p->vm->delayed;
 	ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
@@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
 	WARN_ON(ib->length_dw > p->num_dw_left);
 	r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f);
 	if (r)
-		goto error;
+		goto job_fail;
 
 	if (p->unlocked) {
 		struct dma_fence *tmp = dma_fence_get(f);
@@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
 	if (fence && !p->immediate)
 		swap(*fence, f);
 	dma_fence_put(f);
-	return 0;
 
-error:
-	amdgpu_job_free(p->job);
+	r = 0;
+
+job_fail:
+	drm_dev_exit(idx);
+nodev:
+	if (r)
+		amdgpu_job_free(p->job);
+
 	return r;
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
                   ` (6 preceding siblings ...)
  2020-06-21  6:03 ` [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug Andrey Grodzovsky
@ 2020-06-21  6:03 ` Andrey Grodzovsky
  2020-06-22  9:53   ` Daniel Vetter
  2020-06-22  9:46 ` [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Daniel Vetter
  8 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-21  6:03 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Andrey Grodzovsky, daniel.vetter, michel, ppaalanen,
	ckoenig.leichtzumerken, alexdeucher

No point to try recovery if device is gone, just messes up things.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6932d75..5d6d3d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
 	return ret;
 }
 
+static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
+{
+	int i;
+
+	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
+		struct amdgpu_ring *ring = adev->rings[i];
+
+		if (!ring || !ring->sched.thread)
+			continue;
+
+		cancel_delayed_work_sync(&ring->sched.work_tdr);
+	}
+}
+
 static void
 amdgpu_pci_remove(struct pci_dev *pdev)
 {
 	struct drm_device *dev = pci_get_drvdata(pdev);
+	struct amdgpu_device *adev = dev->dev_private;
 
 	drm_dev_unplug(dev);
+	amdgpu_cancel_all_tdr(adev);
 	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
 	amdgpu_driver_unload_kms(dev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 4720718..87ff0c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,6 +28,8 @@
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 
+#include <drm/drm_drv.h>
+
 static void amdgpu_job_timedout(struct drm_sched_job *s_job)
 {
 	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
@@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
 
 	memset(&ti, 0, sizeof(struct amdgpu_task_info));
 
+	if (drm_dev_is_unplugged(adev->ddev)) {
+		DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
+					  s_job->sched->name);
+		return;
+	}
+
 	if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
 		DRM_ERROR("ring %s timeout, but soft recovered\n",
 			  s_job->sched->name);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-21  6:03 ` [PATCH v2 1/8] drm: Add dummy page per device or GEM object Andrey Grodzovsky
@ 2020-06-22  9:35   ` Daniel Vetter
  2020-06-22 14:21     ` Pekka Paalanen
                       ` (2 more replies)
  2020-06-22 13:18   ` Christian König
  1 sibling, 3 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22  9:35 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
> Will be used to reroute CPU mapped BO's page faults once
> device is removed.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/drm_file.c  |  8 ++++++++
>  drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>  include/drm/drm_file.h      |  2 ++
>  include/drm/drm_gem.h       |  2 ++
>  4 files changed, 22 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index c4c704e..67c0770 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>  			goto out_prime_destroy;
>  	}
>  
> +	file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!file->dummy_page) {
> +		ret = -ENOMEM;
> +		goto out_prime_destroy;
> +	}
> +
>  	return file;
>  
>  out_prime_destroy:
> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>  	if (dev->driver->postclose)
>  		dev->driver->postclose(dev, file);
>  
> +	__free_page(file->dummy_page);
> +
>  	drm_prime_destroy_file_private(&file->prime);
>  
>  	WARN_ON(!list_empty(&file->event_list));
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 1de2cde..c482e9c 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>  
>  	ret = drm_prime_add_buf_handle(&file_priv->prime,
>  			dma_buf, *handle);
> +
> +	if (!ret) {
> +		obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +		if (!obj->dummy_page)
> +			ret = -ENOMEM;
> +	}
> +
>  	mutex_unlock(&file_priv->prime.lock);
>  	if (ret)
>  		goto fail;
> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
>  		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>  	dma_buf = attach->dmabuf;
>  	dma_buf_detach(attach->dmabuf, attach);
> +
> +	__free_page(obj->dummy_page);
> +
>  	/* remove the reference */
>  	dma_buf_put(dma_buf);
>  }
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 19df802..349a658 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -335,6 +335,8 @@ struct drm_file {
>  	 */
>  	struct drm_prime_file_private prime;
>  

Kerneldoc for these please, including why we need them and when. E.g. the
one in gem_bo should say it's only for exported buffers, so that we're not
colliding security spaces.

> +	struct page *dummy_page;
> +
>  	/* private: */
>  #if IS_ENABLED(CONFIG_DRM_LEGACY)
>  	unsigned long lock_count; /* DRI1 legacy lock count */
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 0b37506..47460d1 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -310,6 +310,8 @@ struct drm_gem_object {
>  	 *
>  	 */
>  	const struct drm_gem_object_funcs *funcs;
> +
> +	struct page *dummy_page;
>  };

I think amdgpu doesn't care, but everyone else still might care somewhat
about flink. That also shares buffers, so also needs to allocate the
per-bo dummy page.

I also wonder whether we shouldn't have a helper to look up the dummy
page, just to encode in core code how it's supposedo to cascade.
-Daniel

>  
>  /**
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page.
  2020-06-21  6:03 ` [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky
@ 2020-06-22  9:41   ` Daniel Vetter
  2020-06-24  3:31     ` Andrey Grodzovsky
  2020-11-10 17:41     ` Andrey Grodzovsky
  2020-06-22 19:30   ` Christian König
  1 sibling, 2 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22  9:41 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Sun, Jun 21, 2020 at 02:03:02AM -0400, Andrey Grodzovsky wrote:
> On device removal reroute all CPU mappings to dummy page per drm_file
> instance or imported GEM object.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 57 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 389128b..2f8bf5e 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -35,6 +35,8 @@
>  #include <drm/ttm/ttm_bo_driver.h>
>  #include <drm/ttm/ttm_placement.h>
>  #include <drm/drm_vma_manager.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
>  #include <linux/mm.h>
>  #include <linux/pfn_t.h>
>  #include <linux/rbtree.h>
> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)

Hm I think diff and code flow look a bit bad now. What about renaming the
current function to __ttm_bo_vm_fault and then having something like the
below:

ttm_bo_vm_fault(args) {

	if (drm_dev_enter()) {
		__ttm_bo_vm_fault(args);
		drm_dev_exit();
	} else  {
		drm_gem_insert_dummy_pfn();
	}
}

I think drm_gem_insert_dummy_pfn(); should be portable across drivers, so
another nice point to try to unifiy drivers as much as possible.
-Daniel

>  	pgprot_t prot;
>  	struct ttm_buffer_object *bo = vma->vm_private_data;
>  	vm_fault_t ret;
> +	int idx;
> +	struct drm_device *ddev = bo->base.dev;
>  
> -	ret = ttm_bo_vm_reserve(bo, vmf);
> -	if (ret)
> -		return ret;
> +	if (drm_dev_enter(ddev, &idx)) {
> +		ret = ttm_bo_vm_reserve(bo, vmf);
> +		if (ret)
> +			goto exit;
> +
> +		prot = vma->vm_page_prot;
>  
> -	prot = vma->vm_page_prot;
> -	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
> -	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> +		ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
> +		if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> +			goto exit;
> +
> +		dma_resv_unlock(bo->base.resv);
> +
> +exit:
> +		drm_dev_exit(idx);
>  		return ret;
> +	} else {
>  
> -	dma_resv_unlock(bo->base.resv);
> +		struct drm_file *file = NULL;
> +		struct page *dummy_page = NULL;
> +		int handle;
>  
> -	return ret;
> +		/* We are faulting on imported BO from dma_buf */
> +		if (bo->base.dma_buf && bo->base.import_attach) {
> +			dummy_page = bo->base.dummy_page;
> +		/* We are faulting on non imported BO, find drm_file owning the BO*/

Uh, we can't fish that out of the vma->vm_file pointer somehow? Or is that
one all wrong? Doing this kind of list walk looks pretty horrible.

If the vma doesn't have the right pointer I guess next option is that we
store the drm_file page in gem_bo->dummy_page, and replace it on first
export. But that's going to be tricky to track ...

> +		} else {
> +			struct drm_gem_object *gobj;
> +
> +			mutex_lock(&ddev->filelist_mutex);
> +			list_for_each_entry(file, &ddev->filelist, lhead) {
> +				spin_lock(&file->table_lock);
> +				idr_for_each_entry(&file->object_idr, gobj, handle) {
> +					if (gobj == &bo->base) {
> +						dummy_page = file->dummy_page;
> +						break;
> +					}
> +				}
> +				spin_unlock(&file->table_lock);
> +			}
> +			mutex_unlock(&ddev->filelist_mutex);
> +		}
> +
> +		if (dummy_page) {
> +			/*
> +			 * Let do_fault complete the PTE install e.t.c using vmf->page
> +			 *
> +			 * TODO - should i call free_page somewhere ?

Nah, instead don't call get_page. The page will be around as long as
there's a reference for the drm_file or gem_bo, which is longer than any
mmap. Otherwise yes this would like really badly.

> +			 */
> +			get_page(dummy_page);
> +			vmf->page = dummy_page;
> +			return 0;
> +		} else {
> +			return VM_FAULT_SIGSEGV;

Hm that would be a kernel bug, wouldn't it? WARN_ON() required here imo.
-Daniel

> +		}
> +	}
>  }
>  EXPORT_SYMBOL(ttm_bo_vm_fault);
>  
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space
  2020-06-21  6:03 ` [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space Andrey Grodzovsky
@ 2020-06-22  9:45   ` Daniel Vetter
  2020-06-23  5:00     ` Andrey Grodzovsky
  2020-06-22 19:37   ` Christian König
  2020-06-22 19:47   ` Alex Deucher
  2 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22  9:45 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Sun, Jun 21, 2020 at 02:03:03AM -0400, Andrey Grodzovsky wrote:
> Helper function to be used to invalidate all BOs CPU mappings
> once device is removed.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

This seems to be missing the code to invalidate all the dma-buf mmaps?

Probably needs more testcases if you're not yet catching this. Or am I
missing something, and we're exchanging the the address space also for
dma-buf?
-Daniel

> ---
>  drivers/gpu/drm/ttm/ttm_bo.c    | 8 ++++++--
>  include/drm/ttm/ttm_bo_driver.h | 7 +++++++
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index c5b516f..926a365 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)
>  	ttm_bo_unmap_virtual_locked(bo);
>  	ttm_mem_io_unlock(man);
>  }
> -
> -
>  EXPORT_SYMBOL(ttm_bo_unmap_virtual);
>  
> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev)
> +{
> +	unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
> +}
> +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
> +
>  int ttm_bo_wait(struct ttm_buffer_object *bo,
>  		bool interruptible, bool no_wait)
>  {
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index c9e0fd0..39ea44f 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
>  void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
>  
>  /**
> + * ttm_bo_unmap_virtual_address_space
> + *
> + * @bdev: tear down all the virtual mappings for this device
> + */
> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
> +
> +/**
>   * ttm_bo_unmap_virtual
>   *
>   * @bo: tear down the virtual mappings for this BO
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 0/8] RFC Support hot device unplug in amdgpu
  2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
                   ` (7 preceding siblings ...)
  2020-06-21  6:03 ` [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged Andrey Grodzovsky
@ 2020-06-22  9:46 ` Daniel Vetter
  2020-06-23  5:14   ` Andrey Grodzovsky
  8 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22  9:46 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Sun, Jun 21, 2020 at 02:03:00AM -0400, Andrey Grodzovsky wrote:
> This RFC is more of a proof of concept then a fully working solution as there are a few unresolved issues we are hoping to get advise on from people on the mailing list.
> Until now extracting a card either by physical extraction (e.g. eGPU with thunderbolt connection or by emulation through syfs -> /sys/bus/pci/devices/device_id/remove)
> would cause random crashes in user apps. The random crashes in apps were mostly due to the app having mapped a device backed BO into its address space was still
> trying to access the BO while the backing device was gone.
> To answer this first problem Christian suggested to fix the handling of mapped memory in the clients when the device goes away by forcibly unmap all buffers
> the user processes has by clearing their respective VMAs mapping the device BOs. Then when the VMAs try to fill in the page tables again we check in the fault handler
> if the device is removed and if so, return an error. This will generate a SIGBUS to the application which can then cleanly terminate.
> This indeed was done but this in turn created a problem of kernel OOPs were the OOPSes were due to the fact that while the app was terminating because of the SIGBUS
> it would trigger use after free in the driver by calling to accesses device structures that were already released from the pci remove sequence.
> This was handled by introducing a 'flush' sequence during device removal were we wait for drm file reference to drop to 0 meaning all user clients directly using this device terminated.
> With this I was able to cleanly emulate device unplug with X and glxgears running and later emulate device plug back and restart of X and glxgears.
> 
> v2:
> Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document produced by Pekka from those discussions [2] the whole approach with returning SIGBUS
> and waiting for all user clients having CPU mapping of device BOs to die was dropped. Instead as per the document suggestion the device structures are kept alive until the last
> reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs belonging to the device directly or by dma-buf import are rerouted
> to per user process dummy rw page.
> Also, I skipped the 'Requirements for KMS UAPI' section of [2] since i am trying to get the minimal set of requiremnts that still give useful solution to work and this is the
> 'Requirements for Render and Cross-Device UAPI' section and so my test case is removing a secondary device, which is render only and is not involved in KMS.
>  
> This iteration is still more of a draft as I am still facing a few unsolved issues such as a crash in user client when trying to CPU map imported BO if the map happens after device was
> removed and HW failure to plug back a removed device. Also since i don't have real life setup with external GPU connected through TB I am using sysfs to emulate pci remove and i
> expect to encounter more issues once i try this on real life case. I am also expecting some help on this from a user who volunteered to test in the related gitlab ticket.
> So basically this is more of a  way to get feedback if I am moving in the right direction.
> 
> [1] - Discussions during v1 of the patchset https://lists.freedesktop.org/archives/dri-devel/2020-May/265386.html
> [2] - drm/doc: device hot-unplug for userspace https://www.spinics.net/lists/dri-devel/msg259755.html
> [3] - Related gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081

A few high-level commments on the generic parts, I didn't really look at
the amdgpu side yet.

Also a nit: Please tell your mailer to break long lines, it looks funny
and inconsistent otherwise, at least in some of the mailers I use here :-/
-Daniel
>  
> 
> Andrey Grodzovsky (8):
>   drm: Add dummy page per device or GEM object
>   drm/ttm: Remap all page faults to per process dummy page.
>   drm/ttm: Add unampping of the entire device address space
>   drm/amdgpu: Split amdgpu_device_fini into early and late
>   drm/amdgpu: Refactor sysfs removal
>   drm/amdgpu: Unmap entire device address space on device remove.
>   drm/amdgpu: Fix sdma code crash post device unplug
>   drm/amdgpu: Prevent any job recoveries after device is unplugged.
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h          | 19 +++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c |  7 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   | 50 +++++++++++++++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c      | 23 ++++++++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 12 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c      | 24 ++++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h      |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c      |  8 ++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c      | 23 +++++++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c      |  8 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c      |  3 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c  | 21 ++++++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c     | 13 +++++-
>  drivers/gpu/drm/amd/amdgpu/df_v3_6.c         | 10 +++--
>  drivers/gpu/drm/drm_file.c                   |  8 ++++
>  drivers/gpu/drm/drm_prime.c                  | 10 +++++
>  drivers/gpu/drm/ttm/ttm_bo.c                 |  8 +++-
>  drivers/gpu/drm/ttm/ttm_bo_vm.c              | 65 ++++++++++++++++++++++++----
>  include/drm/drm_file.h                       |  2 +
>  include/drm/drm_gem.h                        |  2 +
>  include/drm/ttm/ttm_bo_driver.h              |  7 +++
>  22 files changed, 286 insertions(+), 55 deletions(-)
> 
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 4/8] drm/amdgpu: Split amdgpu_device_fini into early and late
  2020-06-21  6:03 ` [PATCH v2 4/8] drm/amdgpu: Split amdgpu_device_fini into early and late Andrey Grodzovsky
@ 2020-06-22  9:48   ` Daniel Vetter
  2020-11-12  4:19     ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22  9:48 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Sun, Jun 21, 2020 at 02:03:04AM -0400, Andrey Grodzovsky wrote:
> Some of the stuff in amdgpu_device_fini such as HW interrupts
> disable and pending fences finilization must be done right away on
> pci_remove while most of the stuff which relates to finilizing and releasing
> driver data structures can be kept until drm_driver.release hook is called, i.e.
> when the last device reference is dropped.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Long term I think best if as much of this code is converted over to devm
(for hw stuff) and drmm (for sw stuff and allocations). Doing this all
manually is very error prone.

I've started various such patches and others followed, but thus far only
very simple drivers tackled. But it should be doable step by step at
least, so you should have incremental benefits in code complexity right
away I hope.
-Daniel

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  6 ++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 23 +++++++++++++++++------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
>  7 files changed, 54 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 2a806cb..604a681 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1003,7 +1003,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>  		       struct drm_device *ddev,
>  		       struct pci_dev *pdev,
>  		       uint32_t flags);
> -void amdgpu_device_fini(struct amdgpu_device *adev);
> +void amdgpu_device_fini_early(struct amdgpu_device *adev);
> +void amdgpu_device_fini_late(struct amdgpu_device *adev);
> +
>  int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
>  
>  void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
> @@ -1188,6 +1190,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
>  int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
>  void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  				 struct drm_file *file_priv);
> +void amdgpu_driver_release_kms(struct drm_device *dev);
> +
>  int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>  int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>  int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index cc41e8f..e7b9065 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2309,6 +2309,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
>  {
>  	int i, r;
>  
> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
> +
>  	amdgpu_ras_pre_fini(adev);
>  
>  	if (adev->gmc.xgmi.num_physical_nodes > 1)
> @@ -3304,10 +3306,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   * Tear down the driver info (all asics).
>   * Called at driver shutdown.
>   */
> -void amdgpu_device_fini(struct amdgpu_device *adev)
> +void amdgpu_device_fini_early(struct amdgpu_device *adev)
>  {
> -	int r;
> -
>  	DRM_INFO("amdgpu: finishing device.\n");
>  	flush_delayed_work(&adev->delayed_init_work);
>  	adev->shutdown = true;
> @@ -3330,7 +3330,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>  	if (adev->pm_sysfs_en)
>  		amdgpu_pm_sysfs_fini(adev);
>  	amdgpu_fbdev_fini(adev);
> -	r = amdgpu_device_ip_fini(adev);
> +
> +	amdgpu_irq_fini_early(adev);
> +}
> +
> +void amdgpu_device_fini_late(struct amdgpu_device *adev)
> +{
> +	amdgpu_device_ip_fini(adev);
>  	if (adev->firmware.gpu_info_fw) {
>  		release_firmware(adev->firmware.gpu_info_fw);
>  		adev->firmware.gpu_info_fw = NULL;
> @@ -3368,6 +3374,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>  		amdgpu_pmu_fini(adev);
>  	if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
>  		amdgpu_discovery_fini(adev);
> +
>  }
>  
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 9e5afa5..43592dc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1134,12 +1134,9 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>  {
>  	struct drm_device *dev = pci_get_drvdata(pdev);
>  
> -#ifdef MODULE
> -	if (THIS_MODULE->state != MODULE_STATE_GOING)
> -#endif
> -		DRM_ERROR("Hotplug removal is not supported\n");
>  	drm_dev_unplug(dev);
>  	amdgpu_driver_unload_kms(dev);
> +
>  	pci_disable_device(pdev);
>  	pci_set_drvdata(pdev, NULL);
>  	drm_dev_put(dev);
> @@ -1445,6 +1442,7 @@ static struct drm_driver kms_driver = {
>  	.dumb_create = amdgpu_mode_dumb_create,
>  	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>  	.fops = &amdgpu_driver_kms_fops,
> +	.release = &amdgpu_driver_release_kms,
>  
>  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 0cc4c67..1697655 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -49,6 +49,7 @@
>  #include <drm/drm_irq.h>
>  #include <drm/drm_vblank.h>
>  #include <drm/amdgpu_drm.h>
> +#include <drm/drm_drv.h>
>  #include "amdgpu.h"
>  #include "amdgpu_ih.h"
>  #include "atom.h"
> @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>  	return 0;
>  }
>  
> +
> +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
> +{
> +	if (adev->irq.installed) {
> +		drm_irq_uninstall(adev->ddev);
> +		adev->irq.installed = false;
> +		if (adev->irq.msi_enabled)
> +			pci_free_irq_vectors(adev->pdev);
> +
> +		if (!amdgpu_device_has_dc_support(adev))
> +			flush_work(&adev->hotplug_work);
> +	}
> +}
> +
>  /**
>   * amdgpu_irq_fini - shut down interrupt handling
>   *
> @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>  {
>  	unsigned i, j;
>  
> -	if (adev->irq.installed) {
> -		drm_irq_uninstall(adev->ddev);
> -		adev->irq.installed = false;
> -		if (adev->irq.msi_enabled)
> -			pci_free_irq_vectors(adev->pdev);
> -		if (!amdgpu_device_has_dc_support(adev))
> -			flush_work(&adev->hotplug_work);
> -	}
> -
>  	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>  		if (!adev->irq.client[i].sources)
>  			continue;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> index c718e94..718c70f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
>  
>  int amdgpu_irq_init(struct amdgpu_device *adev);
>  void amdgpu_irq_fini(struct amdgpu_device *adev);
> +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
>  int amdgpu_irq_add_id(struct amdgpu_device *adev,
>  		      unsigned client_id, unsigned src_id,
>  		      struct amdgpu_irq_src *source);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index c0b1904..9d0af22 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -29,6 +29,7 @@
>  #include "amdgpu.h"
>  #include <drm/drm_debugfs.h>
>  #include <drm/amdgpu_drm.h>
> +#include <drm/drm_drv.h>
>  #include "amdgpu_sched.h"
>  #include "amdgpu_uvd.h"
>  #include "amdgpu_vce.h"
> @@ -86,7 +87,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>  	amdgpu_unregister_gpu_instance(adev);
>  
>  	if (adev->rmmio == NULL)
> -		goto done_free;
> +		return;
>  
>  	if (adev->runpm) {
>  		pm_runtime_get_sync(dev->dev);
> @@ -95,11 +96,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>  
>  	amdgpu_acpi_fini(adev);
>  
> -	amdgpu_device_fini(adev);
> -
> -done_free:
> -	kfree(adev);
> -	dev->dev_private = NULL;
> +	amdgpu_device_fini_early(adev);
>  }
>  
>  void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
> @@ -1108,6 +1105,20 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  	pm_runtime_put_autosuspend(dev->dev);
>  }
>  
> +
> +void amdgpu_driver_release_kms (struct drm_device *dev)
> +{
> +	struct amdgpu_device *adev = dev->dev_private;
> +
> +	amdgpu_device_fini_late(adev);
> +
> +	kfree(adev);
> +	dev->dev_private = NULL;
> +
> +	drm_dev_fini(dev);
> +	kfree(dev);
> +}
> +
>  /*
>   * VBlank related functions.
>   */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 7348619..169c2239 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2056,9 +2056,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
>  {
>  	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>  
> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
> +
>  	if (!con)
>  		return 0;
>  
> +
>  	/* Need disable ras on all IPs here before ip [hw/sw]fini */
>  	amdgpu_ras_disable_all_features(adev, 0);
>  	amdgpu_ras_recovery_fini(adev);
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-21  6:03 ` [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal Andrey Grodzovsky
@ 2020-06-22  9:51   ` Daniel Vetter
  2020-06-22 11:21     ` Greg KH
  2020-06-22 13:19   ` Christian König
  1 sibling, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22  9:51 UTC (permalink / raw)
  To: Andrey Grodzovsky, Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
> Track sysfs files in a list so they all can be removed during pci remove
> since otherwise their removal after that causes crash because parent
> folder was already removed during pci remove.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Uh I thought sysfs just gets yanked completely. Please check with Greg KH
whether hand-rolling all this really is the right solution here ... Feels
very wrong. I thought this was all supposed to work by adding attributes
before publishing the sysfs node, and then letting sysfs clean up
everything. Not by cleaning up manually yourself.

Adding Greg for an authoritative answer.
-Daniel

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h          | 13 +++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c |  7 +++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   | 35 ++++++++++++++++++++++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 12 ++++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c      |  8 ++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c     | 13 ++++++++++-
>  drivers/gpu/drm/amd/amdgpu/df_v3_6.c         | 10 +++++---
>  8 files changed, 99 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 604a681..ba3775f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -726,6 +726,15 @@ struct amd_powerplay {
>  
>  #define AMDGPU_RESET_MAGIC_NUM 64
>  #define AMDGPU_MAX_DF_PERFMONS 4
> +
> +struct amdgpu_sysfs_list_node {
> +	struct list_head head;
> +	struct device_attribute *attr;
> +};
> +
> +#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
> +	struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr}
> +
>  struct amdgpu_device {
>  	struct device			*dev;
>  	struct drm_device		*ddev;
> @@ -992,6 +1001,10 @@ struct amdgpu_device {
>  	char				product_number[16];
>  	char				product_name[32];
>  	char				serial[16];
> +
> +	struct list_head sysfs_files_list;
> +	struct mutex	 sysfs_files_list_lock;
> +
>  };
>  
>  static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> index fdd52d8..c1549ee 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> @@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev,
>  	return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version);
>  }
>  
> +
>  static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version,
>  		   NULL);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
>  
>  /**
>   * amdgpu_atombios_fini - free the driver info and callbacks for atombios
> @@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev)
>  	adev->mode_info.atom_context = NULL;
>  	kfree(adev->mode_info.atom_card_info);
>  	adev->mode_info.atom_card_info = NULL;
> -	device_remove_file(adev->dev, &dev_attr_vbios_version);
>  }
>  
>  /**
> @@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev)
>  		return ret;
>  	}
>  
> +	mutex_lock(&adev->sysfs_files_list_lock);
> +	list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list);
> +	mutex_unlock(&adev->sysfs_files_list_lock);
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e7b9065..3173046 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = {
>  	NULL
>  };
>  
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count);
> +
> +
>  /**
>   * amdgpu_device_init - initialize the driver
>   *
> @@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>  	INIT_LIST_HEAD(&adev->shadow_list);
>  	mutex_init(&adev->shadow_list_lock);
>  
> +	INIT_LIST_HEAD(&adev->sysfs_files_list);
> +	mutex_init(&adev->sysfs_files_list_lock);
> +
>  	INIT_DELAYED_WORK(&adev->delayed_init_work,
>  			  amdgpu_device_delayed_init_work_handler);
>  	INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
> @@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>  	if (r) {
>  		dev_err(adev->dev, "Could not create amdgpu device attr\n");
>  		return r;
> +	} else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
>  	}
>  
>  	if (IS_ENABLED(CONFIG_PERF_EVENTS))
> @@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>  	return r;
>  }
>  
> +static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev)
> +{
> +	struct amdgpu_sysfs_list_node *node;
> +
> +	mutex_lock(&adev->sysfs_files_list_lock);
> +	list_for_each_entry(node, &adev->sysfs_files_list, head)
> +		device_remove_file(adev->dev, node->attr);
> +	mutex_unlock(&adev->sysfs_files_list_lock);
> +}
> +
>  /**
>   * amdgpu_device_fini - tear down the driver
>   *
> @@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev)
>  	amdgpu_fbdev_fini(adev);
>  
>  	amdgpu_irq_fini_early(adev);
> +
> +	amdgpu_sysfs_remove_files(adev);
> +
> +	if (adev->ucode_sysfs_en)
> +		amdgpu_ucode_sysfs_fini(adev);
>  }
>  
>  void amdgpu_device_fini_late(struct amdgpu_device *adev)
> @@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev)
>  	adev->rmmio = NULL;
>  	amdgpu_device_doorbell_fini(adev);
>  
> -	if (adev->ucode_sysfs_en)
> -		amdgpu_ucode_sysfs_fini(adev);
> -
> -	sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
>  	if (IS_ENABLED(CONFIG_PERF_EVENTS))
>  		amdgpu_pmu_fini(adev);
>  	if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> index 6271044..e7b6c4a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> @@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
>  static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO,
>  	           amdgpu_mem_info_gtt_used_show, NULL);
>  
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used);
> +
>  /**
>   * amdgpu_gtt_mgr_init - init GTT manager and DRM MM
>   *
> @@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
>  		return ret;
>  	}
>  
> +	mutex_lock(&adev->sysfs_files_list_lock);
> +	list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list);
> +	list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list);
> +	mutex_unlock(&adev->sysfs_files_list_lock);
> +
>  	return 0;
>  }
>  
> @@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
>   */
>  static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
>  {
> -	struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
>  	struct amdgpu_gtt_mgr *mgr = man->priv;
>  	spin_lock(&mgr->lock);
>  	drm_mm_takedown(&mgr->mm);
> @@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
>  	kfree(mgr);
>  	man->priv = NULL;
>  
> -	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
> -	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
> -
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index ddb4af0c..554fec0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR,
>  		   psp_usbc_pd_fw_sysfs_read,
>  		   psp_usbc_pd_fw_sysfs_write);
>  
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw);
> +
>  
>  
>  const struct amd_ip_funcs psp_ip_funcs = {
> @@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
>  
>  	if (ret)
>  		DRM_ERROR("Failed to create USBC PD FW control file!");
> +	else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
> +	}
>  
>  	return ret;
>  }
>  
>  static void psp_sysfs_fini(struct amdgpu_device *adev)
>  {
> -	device_remove_file(adev->dev, &dev_attr_usbc_pd_fw);
>  }
>  
>  const struct amdgpu_ip_block_version psp_v3_1_ip_block =
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 7723937..39c400c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO,
>  static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO,
>  		   amdgpu_mem_info_vram_vendor, NULL);
>  
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor);
> +
>  static const struct attribute *amdgpu_vram_mgr_attributes[] = {
>  	&dev_attr_mem_info_vram_total.attr,
>  	&dev_attr_mem_info_vis_vram_total.attr,
> @@ -184,6 +190,15 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man,
>  	ret = sysfs_create_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes);
>  	if (ret)
>  		DRM_ERROR("Failed to register sysfs\n");
> +	else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_mem_info_vram_total.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_mem_info_vis_vram_total.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_mem_info_vram_used.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_mem_info_vis_vram_used.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_mem_info_vram_vendor.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
> +	}
>  
>  	return 0;
>  }
> @@ -198,7 +213,6 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man,
>   */
>  static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man)
>  {
> -	struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
>  	struct amdgpu_vram_mgr *mgr = man->priv;
>  
>  	spin_lock(&mgr->lock);
> @@ -206,7 +220,6 @@ static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man)
>  	spin_unlock(&mgr->lock);
>  	kfree(mgr);
>  	man->priv = NULL;
> -	sysfs_remove_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes);
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index 90610b4..455eaa4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -272,6 +272,9 @@ static ssize_t amdgpu_xgmi_show_error(struct device *dev,
>  static DEVICE_ATTR(xgmi_device_id, S_IRUGO, amdgpu_xgmi_show_device_id, NULL);
>  static DEVICE_ATTR(xgmi_error, S_IRUGO, amdgpu_xgmi_show_error, NULL);
>  
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_device_id);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_error);
> +
>  static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
>  					 struct amdgpu_hive_info *hive)
>  {
> @@ -285,10 +288,19 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
>  		return ret;
>  	}
>  
> +	mutex_lock(&adev->sysfs_files_list_lock);
> +	list_add_tail(&dev_attr_handle_xgmi_device_id.head, &adev->sysfs_files_list);
> +	mutex_unlock(&adev->sysfs_files_list_lock);
> +
>  	/* Create xgmi error file */
>  	ret = device_create_file(adev->dev, &dev_attr_xgmi_error);
>  	if (ret)
>  		pr_err("failed to create xgmi_error\n");
> +	else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_xgmi_error.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
> +	}
>  
>  
>  	/* Create sysfs link to hive info folder on the first device */
> @@ -325,7 +337,6 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
>  static void amdgpu_xgmi_sysfs_rem_dev_info(struct amdgpu_device *adev,
>  					  struct amdgpu_hive_info *hive)
>  {
> -	device_remove_file(adev->dev, &dev_attr_xgmi_device_id);
>  	sysfs_remove_link(&adev->dev->kobj, adev->ddev->unique);
>  	sysfs_remove_link(hive->kobj, adev->ddev->unique);
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> index a7b8292..f95b0b2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> @@ -265,6 +265,8 @@ static ssize_t df_v3_6_get_df_cntr_avail(struct device *dev,
>  /* device attr for available perfmon counters */
>  static DEVICE_ATTR(df_cntr_avail, S_IRUGO, df_v3_6_get_df_cntr_avail, NULL);
>  
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(df_cntr_avail);
> +
>  static void df_v3_6_query_hashes(struct amdgpu_device *adev)
>  {
>  	u32 tmp;
> @@ -299,6 +301,11 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev)
>  	ret = device_create_file(adev->dev, &dev_attr_df_cntr_avail);
>  	if (ret)
>  		DRM_ERROR("failed to create file for available df counters\n");
> +	else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_df_cntr_avail.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
> +	}
>  
>  	for (i = 0; i < AMDGPU_MAX_DF_PERFMONS; i++)
>  		adev->df_perfmon_config_assign_mask[i] = 0;
> @@ -308,9 +315,6 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev)
>  
>  static void df_v3_6_sw_fini(struct amdgpu_device *adev)
>  {
> -
> -	device_remove_file(adev->dev, &dev_attr_df_cntr_avail);
> -
>  }
>  
>  static void df_v3_6_enable_broadcast_mode(struct amdgpu_device *adev,
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-06-21  6:03 ` [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged Andrey Grodzovsky
@ 2020-06-22  9:53   ` Daniel Vetter
  2020-11-17 18:38     ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22  9:53 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
> No point to try recovery if device is gone, just messes up things.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 6932d75..5d6d3d9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>  	return ret;
>  }
>  
> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
> +{
> +	int i;
> +
> +	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> +		struct amdgpu_ring *ring = adev->rings[i];
> +
> +		if (!ring || !ring->sched.thread)
> +			continue;
> +
> +		cancel_delayed_work_sync(&ring->sched.work_tdr);
> +	}
> +}

I think this is a function that's supposed to be in drm/scheduler, not
here. Might also just be your cleanup code being ordered wrongly, or your
split in one of the earlier patches not done quite right.
-Daniel

> +
>  static void
>  amdgpu_pci_remove(struct pci_dev *pdev)
>  {
>  	struct drm_device *dev = pci_get_drvdata(pdev);
> +	struct amdgpu_device *adev = dev->dev_private;
>  
>  	drm_dev_unplug(dev);
> +	amdgpu_cancel_all_tdr(adev);
>  	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>  	amdgpu_driver_unload_kms(dev);
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 4720718..87ff0c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -28,6 +28,8 @@
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
>  
> +#include <drm/drm_drv.h>
> +
>  static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>  {
>  	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>  
>  	memset(&ti, 0, sizeof(struct amdgpu_task_info));
>  
> +	if (drm_dev_is_unplugged(adev->ddev)) {
> +		DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
> +					  s_job->sched->name);
> +		return;
> +	}
> +
>  	if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
>  		DRM_ERROR("ring %s timeout, but soft recovered\n",
>  			  s_job->sched->name);
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug
  2020-06-21  6:03 ` [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug Andrey Grodzovsky
@ 2020-06-22  9:55   ` Daniel Vetter
  2020-06-22 19:40   ` Christian König
  1 sibling, 0 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22  9:55 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Sun, Jun 21, 2020 at 02:03:07AM -0400, Andrey Grodzovsky wrote:
> entity->rq becomes null aftre device unplugged so just return early
> in that case.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

That looks very deep in amdgpu internals ... how do you even get in here
after the device is fully unplugged on the sw side?

Is this amdkfd doing something stupid because entirely unaware of what
amdgpu has done? Something else? Just feels like this is just duct-taping
over a more fundamental problem, after hotunplug no one should be able to
even submit anything new, or do bo moves, or well anything really.
-Daniel

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index 8d9c6fe..d252427 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -24,6 +24,7 @@
>  #include "amdgpu_job.h"
>  #include "amdgpu_object.h"
>  #include "amdgpu_trace.h"
> +#include <drm/drm_drv.h>
>  
>  #define AMDGPU_VM_SDMA_MIN_NUM_DW	256u
>  #define AMDGPU_VM_SDMA_MAX_NUM_DW	(16u * 1024u)
> @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>  	struct drm_sched_entity *entity;
>  	struct amdgpu_ring *ring;
>  	struct dma_fence *f;
> -	int r;
> +	int r, idx;
> +
> +	if (!drm_dev_enter(p->adev->ddev, &idx)) {
> +		r = -ENODEV;
> +		goto nodev;
> +	}
>  
>  	entity = p->immediate ? &p->vm->immediate : &p->vm->delayed;
>  	ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
> @@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>  	WARN_ON(ib->length_dw > p->num_dw_left);
>  	r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f);
>  	if (r)
> -		goto error;
> +		goto job_fail;
>  
>  	if (p->unlocked) {
>  		struct dma_fence *tmp = dma_fence_get(f);
> @@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>  	if (fence && !p->immediate)
>  		swap(*fence, f);
>  	dma_fence_put(f);
> -	return 0;
>  
> -error:
> -	amdgpu_job_free(p->job);
> +	r = 0;
> +
> +job_fail:
> +	drm_dev_exit(idx);
> +nodev:
> +	if (r)
> +		amdgpu_job_free(p->job);
> +
>  	return r;
>  }
>  
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove.
  2020-06-21  6:03 ` [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove Andrey Grodzovsky
@ 2020-06-22  9:56   ` Daniel Vetter
  2020-06-22 19:38   ` Christian König
  1 sibling, 0 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22  9:56 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Sun, Jun 21, 2020 at 02:03:06AM -0400, Andrey Grodzovsky wrote:
> Use the new TTM interface to invalidate all exsisting BO CPU mappings
> form all user proccesses.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 43592dc..6932d75 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>  	struct drm_device *dev = pci_get_drvdata(pdev);
>  
>  	drm_dev_unplug(dev);
> +	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>  	amdgpu_driver_unload_kms(dev);

Hm a ttm, or maybe even vram helper function which wraps drm_dev_unplug +
ttm unmapping into one would be nice I think? I suspect there's going to
be more in the future here.
-Daniel

>  
>  	pci_disable_device(pdev);
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-22  9:51   ` Daniel Vetter
@ 2020-06-22 11:21     ` Greg KH
  2020-06-22 16:07       ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Greg KH @ 2020-06-22 11:21 UTC (permalink / raw)
  To: Daniel Vetter, Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
> > Track sysfs files in a list so they all can be removed during pci remove
> > since otherwise their removal after that causes crash because parent
> > folder was already removed during pci remove.

Huh?  That should not happen, do you have a backtrace of that crash?

> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> 
> Uh I thought sysfs just gets yanked completely. Please check with Greg KH
> whether hand-rolling all this really is the right solution here ... Feels
> very wrong. I thought this was all supposed to work by adding attributes
> before publishing the sysfs node, and then letting sysfs clean up
> everything. Not by cleaning up manually yourself.

Yes, that is supposed to be the correct thing to do.

> 
> Adding Greg for an authoritative answer.
> -Daniel
> 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h          | 13 +++++++++++
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c |  7 +++++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   | 35 ++++++++++++++++++++++++----
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 12 ++++++----
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c      |  8 ++++++-
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++--
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c     | 13 ++++++++++-
> >  drivers/gpu/drm/amd/amdgpu/df_v3_6.c         | 10 +++++---
> >  8 files changed, 99 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index 604a681..ba3775f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -726,6 +726,15 @@ struct amd_powerplay {
> >  
> >  #define AMDGPU_RESET_MAGIC_NUM 64
> >  #define AMDGPU_MAX_DF_PERFMONS 4
> > +
> > +struct amdgpu_sysfs_list_node {
> > +	struct list_head head;
> > +	struct device_attribute *attr;
> > +};

You know we have lists of attributes already, called attribute groups,
if you really wanted to do something like this.  But, I don't think so.

Either way, don't hand-roll your own stuff that the driver core has
provided for you for a decade or more, that's just foolish :)

> > +
> > +#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
> > +	struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr}
> > +
> >  struct amdgpu_device {
> >  	struct device			*dev;
> >  	struct drm_device		*ddev;
> > @@ -992,6 +1001,10 @@ struct amdgpu_device {
> >  	char				product_number[16];
> >  	char				product_name[32];
> >  	char				serial[16];
> > +
> > +	struct list_head sysfs_files_list;
> > +	struct mutex	 sysfs_files_list_lock;
> > +
> >  };
> >  
> >  static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> > index fdd52d8..c1549ee 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> > @@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev,
> >  	return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version);
> >  }
> >  
> > +
> >  static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version,
> >  		   NULL);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
> >  
> >  /**
> >   * amdgpu_atombios_fini - free the driver info and callbacks for atombios
> > @@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev)
> >  	adev->mode_info.atom_context = NULL;
> >  	kfree(adev->mode_info.atom_card_info);
> >  	adev->mode_info.atom_card_info = NULL;
> > -	device_remove_file(adev->dev, &dev_attr_vbios_version);
> >  }
> >  
> >  /**
> > @@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev)
> >  		return ret;
> >  	}
> >  
> > +	mutex_lock(&adev->sysfs_files_list_lock);
> > +	list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list);
> > +	mutex_unlock(&adev->sysfs_files_list_lock);
> > +
> >  	return 0;
> >  }
> >  
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index e7b9065..3173046 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = {
> >  	NULL
> >  };
> >  
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count);
> > +
> > +
> >  /**
> >   * amdgpu_device_init - initialize the driver
> >   *
> > @@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >  	INIT_LIST_HEAD(&adev->shadow_list);
> >  	mutex_init(&adev->shadow_list_lock);
> >  
> > +	INIT_LIST_HEAD(&adev->sysfs_files_list);
> > +	mutex_init(&adev->sysfs_files_list_lock);
> > +
> >  	INIT_DELAYED_WORK(&adev->delayed_init_work,
> >  			  amdgpu_device_delayed_init_work_handler);
> >  	INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
> > @@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >  	if (r) {
> >  		dev_err(adev->dev, "Could not create amdgpu device attr\n");
> >  		return r;
> > +	} else {
> > +		mutex_lock(&adev->sysfs_files_list_lock);
> > +		list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list);
> > +		list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list);
> > +		list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list);
> > +		list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list);
> > +		mutex_unlock(&adev->sysfs_files_list_lock);
> >  	}
> >  
> >  	if (IS_ENABLED(CONFIG_PERF_EVENTS))
> > @@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> >  	return r;
> >  }
> >  
> > +static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev)
> > +{
> > +	struct amdgpu_sysfs_list_node *node;
> > +
> > +	mutex_lock(&adev->sysfs_files_list_lock);
> > +	list_for_each_entry(node, &adev->sysfs_files_list, head)
> > +		device_remove_file(adev->dev, node->attr);
> > +	mutex_unlock(&adev->sysfs_files_list_lock);
> > +}
> > +
> >  /**
> >   * amdgpu_device_fini - tear down the driver
> >   *
> > @@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev)
> >  	amdgpu_fbdev_fini(adev);
> >  
> >  	amdgpu_irq_fini_early(adev);
> > +
> > +	amdgpu_sysfs_remove_files(adev);
> > +
> > +	if (adev->ucode_sysfs_en)
> > +		amdgpu_ucode_sysfs_fini(adev);
> >  }
> >  
> >  void amdgpu_device_fini_late(struct amdgpu_device *adev)
> > @@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev)
> >  	adev->rmmio = NULL;
> >  	amdgpu_device_doorbell_fini(adev);
> >  
> > -	if (adev->ucode_sysfs_en)
> > -		amdgpu_ucode_sysfs_fini(adev);
> > -
> > -	sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
> >  	if (IS_ENABLED(CONFIG_PERF_EVENTS))
> >  		amdgpu_pmu_fini(adev);
> >  	if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > index 6271044..e7b6c4a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> > @@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
> >  static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO,
> >  	           amdgpu_mem_info_gtt_used_show, NULL);
> >  
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used);
> > +
> >  /**
> >   * amdgpu_gtt_mgr_init - init GTT manager and DRM MM
> >   *
> > @@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
> >  		return ret;
> >  	}
> >  
> > +	mutex_lock(&adev->sysfs_files_list_lock);
> > +	list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list);
> > +	list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list);
> > +	mutex_unlock(&adev->sysfs_files_list_lock);
> > +
> >  	return 0;
> >  }
> >  
> > @@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
> >   */
> >  static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
> >  {
> > -	struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
> >  	struct amdgpu_gtt_mgr *mgr = man->priv;
> >  	spin_lock(&mgr->lock);
> >  	drm_mm_takedown(&mgr->mm);
> > @@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
> >  	kfree(mgr);
> >  	man->priv = NULL;
> >  
> > -	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
> > -	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
> > -
> >  	return 0;
> >  }
> >  
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index ddb4af0c..554fec0 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR,
> >  		   psp_usbc_pd_fw_sysfs_read,
> >  		   psp_usbc_pd_fw_sysfs_write);
> >  
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw);
> > +
> >  
> >  
> >  const struct amd_ip_funcs psp_ip_funcs = {
> > @@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
> >  
> >  	if (ret)
> >  		DRM_ERROR("Failed to create USBC PD FW control file!");
> > +	else {
> > +		mutex_lock(&adev->sysfs_files_list_lock);
> > +		list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list);
> > +		mutex_unlock(&adev->sysfs_files_list_lock);
> > +	}
> >  
> >  	return ret;
> >  }
> >  
> >  static void psp_sysfs_fini(struct amdgpu_device *adev)
> >  {
> > -	device_remove_file(adev->dev, &dev_attr_usbc_pd_fw);
> >  }
> >  
> >  const struct amdgpu_ip_block_version psp_v3_1_ip_block =
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > index 7723937..39c400c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> > @@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO,
> >  static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO,
> >  		   amdgpu_mem_info_vram_vendor, NULL);
> >  
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used);
> > +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor);

Converting all of these individual attributes to an attribute group
would be a nice thing to do anyway.  Makes your logic much simpler and
less error-prone.

But again, the driver core should do all of the device file removal
stuff automatically for you when your PCI device is removed from the
system _UNLESS_ you are doing crazy things like creating child devices
or messing with raw kobjects or other horrible things that I haven't
read the code to see if you are, but hopefully not :)

thanks,

greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-21  6:03 ` [PATCH v2 1/8] drm: Add dummy page per device or GEM object Andrey Grodzovsky
  2020-06-22  9:35   ` Daniel Vetter
@ 2020-06-22 13:18   ` Christian König
  2020-06-22 14:23     ` Daniel Vetter
  2020-06-22 14:32     ` Andrey Grodzovsky
  1 sibling, 2 replies; 97+ messages in thread
From: Christian König @ 2020-06-22 13:18 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen

Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> Will be used to reroute CPU mapped BO's page faults once
> device is removed.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>   include/drm/drm_file.h      |  2 ++
>   include/drm/drm_gem.h       |  2 ++
>   4 files changed, 22 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index c4c704e..67c0770 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>   			goto out_prime_destroy;
>   	}
>   
> +	file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!file->dummy_page) {
> +		ret = -ENOMEM;
> +		goto out_prime_destroy;
> +	}
> +
>   	return file;
>   
>   out_prime_destroy:
> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>   	if (dev->driver->postclose)
>   		dev->driver->postclose(dev, file);
>   
> +	__free_page(file->dummy_page);
> +
>   	drm_prime_destroy_file_private(&file->prime);
>   
>   	WARN_ON(!list_empty(&file->event_list));
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 1de2cde..c482e9c 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>   
>   	ret = drm_prime_add_buf_handle(&file_priv->prime,
>   			dma_buf, *handle);
> +
> +	if (!ret) {
> +		obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +		if (!obj->dummy_page)
> +			ret = -ENOMEM;
> +	}
> +

While the per file case still looks acceptable this is a clear NAK since 
it will massively increase the memory needed for a prime exported object.

I think that this is quite overkill in the first place and for the hot 
unplug case we can just use the global dummy page as well.

Christian.

>   	mutex_unlock(&file_priv->prime.lock);
>   	if (ret)
>   		goto fail;
> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
>   		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>   	dma_buf = attach->dmabuf;
>   	dma_buf_detach(attach->dmabuf, attach);
> +
> +	__free_page(obj->dummy_page);
> +
>   	/* remove the reference */
>   	dma_buf_put(dma_buf);
>   }
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 19df802..349a658 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -335,6 +335,8 @@ struct drm_file {
>   	 */
>   	struct drm_prime_file_private prime;
>   
> +	struct page *dummy_page;
> +
>   	/* private: */
>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>   	unsigned long lock_count; /* DRI1 legacy lock count */
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 0b37506..47460d1 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -310,6 +310,8 @@ struct drm_gem_object {
>   	 *
>   	 */
>   	const struct drm_gem_object_funcs *funcs;
> +
> +	struct page *dummy_page;
>   };
>   
>   /**

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-21  6:03 ` [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal Andrey Grodzovsky
  2020-06-22  9:51   ` Daniel Vetter
@ 2020-06-22 13:19   ` Christian König
  1 sibling, 0 replies; 97+ messages in thread
From: Christian König @ 2020-06-22 13:19 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen

Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> Track sysfs files in a list so they all can be removed during pci remove
> since otherwise their removal after that causes crash because parent
> folder was already removed during pci remove.

That looks extremely fishy to me.

It sounds like we just don't remove stuff in the right order.

Christian.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h          | 13 +++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c |  7 +++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   | 35 ++++++++++++++++++++++++----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 12 ++++++----
>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c      |  8 ++++++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c     | 13 ++++++++++-
>   drivers/gpu/drm/amd/amdgpu/df_v3_6.c         | 10 +++++---
>   8 files changed, 99 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 604a681..ba3775f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -726,6 +726,15 @@ struct amd_powerplay {
>   
>   #define AMDGPU_RESET_MAGIC_NUM 64
>   #define AMDGPU_MAX_DF_PERFMONS 4
> +
> +struct amdgpu_sysfs_list_node {
> +	struct list_head head;
> +	struct device_attribute *attr;
> +};
> +
> +#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
> +	struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr}
> +
>   struct amdgpu_device {
>   	struct device			*dev;
>   	struct drm_device		*ddev;
> @@ -992,6 +1001,10 @@ struct amdgpu_device {
>   	char				product_number[16];
>   	char				product_name[32];
>   	char				serial[16];
> +
> +	struct list_head sysfs_files_list;
> +	struct mutex	 sysfs_files_list_lock;
> +
>   };
>   
>   static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> index fdd52d8..c1549ee 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
> @@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev,
>   	return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version);
>   }
>   
> +
>   static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version,
>   		   NULL);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
>   
>   /**
>    * amdgpu_atombios_fini - free the driver info and callbacks for atombios
> @@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev)
>   	adev->mode_info.atom_context = NULL;
>   	kfree(adev->mode_info.atom_card_info);
>   	adev->mode_info.atom_card_info = NULL;
> -	device_remove_file(adev->dev, &dev_attr_vbios_version);
>   }
>   
>   /**
> @@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev)
>   		return ret;
>   	}
>   
> +	mutex_lock(&adev->sysfs_files_list_lock);
> +	list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list);
> +	mutex_unlock(&adev->sysfs_files_list_lock);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index e7b9065..3173046 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = {
>   	NULL
>   };
>   
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count);
> +
> +
>   /**
>    * amdgpu_device_init - initialize the driver
>    *
> @@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   	INIT_LIST_HEAD(&adev->shadow_list);
>   	mutex_init(&adev->shadow_list_lock);
>   
> +	INIT_LIST_HEAD(&adev->sysfs_files_list);
> +	mutex_init(&adev->sysfs_files_list_lock);
> +
>   	INIT_DELAYED_WORK(&adev->delayed_init_work,
>   			  amdgpu_device_delayed_init_work_handler);
>   	INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
> @@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   	if (r) {
>   		dev_err(adev->dev, "Could not create amdgpu device attr\n");
>   		return r;
> +	} else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
>   	}
>   
>   	if (IS_ENABLED(CONFIG_PERF_EVENTS))
> @@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   	return r;
>   }
>   
> +static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev)
> +{
> +	struct amdgpu_sysfs_list_node *node;
> +
> +	mutex_lock(&adev->sysfs_files_list_lock);
> +	list_for_each_entry(node, &adev->sysfs_files_list, head)
> +		device_remove_file(adev->dev, node->attr);
> +	mutex_unlock(&adev->sysfs_files_list_lock);
> +}
> +
>   /**
>    * amdgpu_device_fini - tear down the driver
>    *
> @@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev)
>   	amdgpu_fbdev_fini(adev);
>   
>   	amdgpu_irq_fini_early(adev);
> +
> +	amdgpu_sysfs_remove_files(adev);
> +
> +	if (adev->ucode_sysfs_en)
> +		amdgpu_ucode_sysfs_fini(adev);
>   }
>   
>   void amdgpu_device_fini_late(struct amdgpu_device *adev)
> @@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev)
>   	adev->rmmio = NULL;
>   	amdgpu_device_doorbell_fini(adev);
>   
> -	if (adev->ucode_sysfs_en)
> -		amdgpu_ucode_sysfs_fini(adev);
> -
> -	sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
>   	if (IS_ENABLED(CONFIG_PERF_EVENTS))
>   		amdgpu_pmu_fini(adev);
>   	if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> index 6271044..e7b6c4a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
> @@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
>   static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO,
>   	           amdgpu_mem_info_gtt_used_show, NULL);
>   
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used);
> +
>   /**
>    * amdgpu_gtt_mgr_init - init GTT manager and DRM MM
>    *
> @@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
>   		return ret;
>   	}
>   
> +	mutex_lock(&adev->sysfs_files_list_lock);
> +	list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list);
> +	list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list);
> +	mutex_unlock(&adev->sysfs_files_list_lock);
> +
>   	return 0;
>   }
>   
> @@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
>    */
>   static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
>   {
> -	struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
>   	struct amdgpu_gtt_mgr *mgr = man->priv;
>   	spin_lock(&mgr->lock);
>   	drm_mm_takedown(&mgr->mm);
> @@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
>   	kfree(mgr);
>   	man->priv = NULL;
>   
> -	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
> -	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
> -
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index ddb4af0c..554fec0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR,
>   		   psp_usbc_pd_fw_sysfs_read,
>   		   psp_usbc_pd_fw_sysfs_write);
>   
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw);
> +
>   
>   
>   const struct amd_ip_funcs psp_ip_funcs = {
> @@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
>   
>   	if (ret)
>   		DRM_ERROR("Failed to create USBC PD FW control file!");
> +	else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
> +	}
>   
>   	return ret;
>   }
>   
>   static void psp_sysfs_fini(struct amdgpu_device *adev)
>   {
> -	device_remove_file(adev->dev, &dev_attr_usbc_pd_fw);
>   }
>   
>   const struct amdgpu_ip_block_version psp_v3_1_ip_block =
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 7723937..39c400c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO,
>   static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO,
>   		   amdgpu_mem_info_vram_vendor, NULL);
>   
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor);
> +
>   static const struct attribute *amdgpu_vram_mgr_attributes[] = {
>   	&dev_attr_mem_info_vram_total.attr,
>   	&dev_attr_mem_info_vis_vram_total.attr,
> @@ -184,6 +190,15 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man,
>   	ret = sysfs_create_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes);
>   	if (ret)
>   		DRM_ERROR("Failed to register sysfs\n");
> +	else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_mem_info_vram_total.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_mem_info_vis_vram_total.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_mem_info_vram_used.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_mem_info_vis_vram_used.head, &adev->sysfs_files_list);
> +		list_add_tail(&dev_attr_handle_mem_info_vram_vendor.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
> +	}
>   
>   	return 0;
>   }
> @@ -198,7 +213,6 @@ static int amdgpu_vram_mgr_init(struct ttm_mem_type_manager *man,
>    */
>   static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man)
>   {
> -	struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
>   	struct amdgpu_vram_mgr *mgr = man->priv;
>   
>   	spin_lock(&mgr->lock);
> @@ -206,7 +220,6 @@ static int amdgpu_vram_mgr_fini(struct ttm_mem_type_manager *man)
>   	spin_unlock(&mgr->lock);
>   	kfree(mgr);
>   	man->priv = NULL;
> -	sysfs_remove_files(&adev->dev->kobj, amdgpu_vram_mgr_attributes);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> index 90610b4..455eaa4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> @@ -272,6 +272,9 @@ static ssize_t amdgpu_xgmi_show_error(struct device *dev,
>   static DEVICE_ATTR(xgmi_device_id, S_IRUGO, amdgpu_xgmi_show_device_id, NULL);
>   static DEVICE_ATTR(xgmi_error, S_IRUGO, amdgpu_xgmi_show_error, NULL);
>   
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_device_id);
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(xgmi_error);
> +
>   static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
>   					 struct amdgpu_hive_info *hive)
>   {
> @@ -285,10 +288,19 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
>   		return ret;
>   	}
>   
> +	mutex_lock(&adev->sysfs_files_list_lock);
> +	list_add_tail(&dev_attr_handle_xgmi_device_id.head, &adev->sysfs_files_list);
> +	mutex_unlock(&adev->sysfs_files_list_lock);
> +
>   	/* Create xgmi error file */
>   	ret = device_create_file(adev->dev, &dev_attr_xgmi_error);
>   	if (ret)
>   		pr_err("failed to create xgmi_error\n");
> +	else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_xgmi_error.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
> +	}
>   
>   
>   	/* Create sysfs link to hive info folder on the first device */
> @@ -325,7 +337,6 @@ static int amdgpu_xgmi_sysfs_add_dev_info(struct amdgpu_device *adev,
>   static void amdgpu_xgmi_sysfs_rem_dev_info(struct amdgpu_device *adev,
>   					  struct amdgpu_hive_info *hive)
>   {
> -	device_remove_file(adev->dev, &dev_attr_xgmi_device_id);
>   	sysfs_remove_link(&adev->dev->kobj, adev->ddev->unique);
>   	sysfs_remove_link(hive->kobj, adev->ddev->unique);
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> index a7b8292..f95b0b2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> @@ -265,6 +265,8 @@ static ssize_t df_v3_6_get_df_cntr_avail(struct device *dev,
>   /* device attr for available perfmon counters */
>   static DEVICE_ATTR(df_cntr_avail, S_IRUGO, df_v3_6_get_df_cntr_avail, NULL);
>   
> +static AMDGPU_DEVICE_ATTR_LIST_NODE(df_cntr_avail);
> +
>   static void df_v3_6_query_hashes(struct amdgpu_device *adev)
>   {
>   	u32 tmp;
> @@ -299,6 +301,11 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev)
>   	ret = device_create_file(adev->dev, &dev_attr_df_cntr_avail);
>   	if (ret)
>   		DRM_ERROR("failed to create file for available df counters\n");
> +	else {
> +		mutex_lock(&adev->sysfs_files_list_lock);
> +		list_add_tail(&dev_attr_handle_df_cntr_avail.head, &adev->sysfs_files_list);
> +		mutex_unlock(&adev->sysfs_files_list_lock);
> +	}
>   
>   	for (i = 0; i < AMDGPU_MAX_DF_PERFMONS; i++)
>   		adev->df_perfmon_config_assign_mask[i] = 0;
> @@ -308,9 +315,6 @@ static void df_v3_6_sw_init(struct amdgpu_device *adev)
>   
>   static void df_v3_6_sw_fini(struct amdgpu_device *adev)
>   {
> -
> -	device_remove_file(adev->dev, &dev_attr_df_cntr_avail);
> -
>   }
>   
>   static void df_v3_6_enable_broadcast_mode(struct amdgpu_device *adev,

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22  9:35   ` Daniel Vetter
@ 2020-06-22 14:21     ` Pekka Paalanen
  2020-06-22 14:24       ` Daniel Vetter
  2020-11-09 20:34     ` Andrey Grodzovsky
  2020-11-15  6:39     ` Andrey Grodzovsky
  2 siblings, 1 reply; 97+ messages in thread
From: Pekka Paalanen @ 2020-06-22 14:21 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Andrey Grodzovsky, daniel.vetter, michel, dri-devel, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


[-- Attachment #1.1: Type: text/plain, Size: 1217 bytes --]

On Mon, 22 Jun 2020 11:35:01 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
> > Will be used to reroute CPU mapped BO's page faults once
> > device is removed.
> > 
> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > ---
> >  drivers/gpu/drm/drm_file.c  |  8 ++++++++
> >  drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> >  include/drm/drm_file.h      |  2 ++
> >  include/drm/drm_gem.h       |  2 ++
> >  4 files changed, 22 insertions(+)

...

> > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > index 0b37506..47460d1 100644
> > --- a/include/drm/drm_gem.h
> > +++ b/include/drm/drm_gem.h
> > @@ -310,6 +310,8 @@ struct drm_gem_object {
> >  	 *
> >  	 */
> >  	const struct drm_gem_object_funcs *funcs;
> > +
> > +	struct page *dummy_page;
> >  };  
> 
> I think amdgpu doesn't care, but everyone else still might care somewhat
> about flink. That also shares buffers, so also needs to allocate the
> per-bo dummy page.

Do you really care about making flink not explode on device
hot-unplug? Why not just leave flink users die in a fire?
It's not a regression.


Thanks,
pq

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22 13:18   ` Christian König
@ 2020-06-22 14:23     ` Daniel Vetter
  2020-06-22 14:32     ` Andrey Grodzovsky
  1 sibling, 0 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22 14:23 UTC (permalink / raw)
  To: Christian König
  Cc: Andrey Grodzovsky, Michel Dänzer, dri-devel, Pekka Paalanen,
	amd-gfx list, Alex Deucher

On Mon, Jun 22, 2020 at 3:18 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> > Will be used to reroute CPU mapped BO's page faults once
> > device is removed.
> >
> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > ---
> >   drivers/gpu/drm/drm_file.c  |  8 ++++++++
> >   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> >   include/drm/drm_file.h      |  2 ++
> >   include/drm/drm_gem.h       |  2 ++
> >   4 files changed, 22 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > index c4c704e..67c0770 100644
> > --- a/drivers/gpu/drm/drm_file.c
> > +++ b/drivers/gpu/drm/drm_file.c
> > @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
> >                       goto out_prime_destroy;
> >       }
> >
> > +     file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > +     if (!file->dummy_page) {
> > +             ret = -ENOMEM;
> > +             goto out_prime_destroy;
> > +     }
> > +
> >       return file;
> >
> >   out_prime_destroy:
> > @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
> >       if (dev->driver->postclose)
> >               dev->driver->postclose(dev, file);
> >
> > +     __free_page(file->dummy_page);
> > +
> >       drm_prime_destroy_file_private(&file->prime);
> >
> >       WARN_ON(!list_empty(&file->event_list));
> > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > index 1de2cde..c482e9c 100644
> > --- a/drivers/gpu/drm/drm_prime.c
> > +++ b/drivers/gpu/drm/drm_prime.c
> > @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> >
> >       ret = drm_prime_add_buf_handle(&file_priv->prime,
> >                       dma_buf, *handle);
> > +
> > +     if (!ret) {
> > +             obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > +             if (!obj->dummy_page)
> > +                     ret = -ENOMEM;
> > +     }
> > +
>
> While the per file case still looks acceptable this is a clear NAK since
> it will massively increase the memory needed for a prime exported object.
>
> I think that this is quite overkill in the first place and for the hot
> unplug case we can just use the global dummy page as well.

Imo we either don't bother with per-file dummy page, or we need this.
Half-way doesn't make much sense, since for anything you dma-buf
exported you have no idea whether it left a sandbox or not.

E.g. anything that's shared between client/compositor has a different
security context, so picking the dummy page of either is the wrong
thing.

If you're worried about the overhead we can also allocate the dummy
page on demand, and SIGBUS if we can't allocate the right one. Then we
just need to track whether a buffer has ever been exported.
-Daniel

>
> Christian.
>
> >       mutex_unlock(&file_priv->prime.lock);
> >       if (ret)
> >               goto fail;
> > @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
> >               dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
> >       dma_buf = attach->dmabuf;
> >       dma_buf_detach(attach->dmabuf, attach);
> > +
> > +     __free_page(obj->dummy_page);
> > +
> >       /* remove the reference */
> >       dma_buf_put(dma_buf);
> >   }
> > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > index 19df802..349a658 100644
> > --- a/include/drm/drm_file.h
> > +++ b/include/drm/drm_file.h
> > @@ -335,6 +335,8 @@ struct drm_file {
> >        */
> >       struct drm_prime_file_private prime;
> >
> > +     struct page *dummy_page;
> > +
> >       /* private: */
> >   #if IS_ENABLED(CONFIG_DRM_LEGACY)
> >       unsigned long lock_count; /* DRI1 legacy lock count */
> > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > index 0b37506..47460d1 100644
> > --- a/include/drm/drm_gem.h
> > +++ b/include/drm/drm_gem.h
> > @@ -310,6 +310,8 @@ struct drm_gem_object {
> >        *
> >        */
> >       const struct drm_gem_object_funcs *funcs;
> > +
> > +     struct page *dummy_page;
> >   };
> >
> >   /**
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22 14:21     ` Pekka Paalanen
@ 2020-06-22 14:24       ` Daniel Vetter
  2020-06-22 14:28         ` Pekka Paalanen
  0 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22 14:24 UTC (permalink / raw)
  To: Pekka Paalanen
  Cc: Andrey Grodzovsky, Christian König, Michel Dänzer,
	dri-devel, amd-gfx list, Alex Deucher

On Mon, Jun 22, 2020 at 4:22 PM Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Mon, 22 Jun 2020 11:35:01 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
> > > Will be used to reroute CPU mapped BO's page faults once
> > > device is removed.
> > >
> > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > ---
> > >  drivers/gpu/drm/drm_file.c  |  8 ++++++++
> > >  drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> > >  include/drm/drm_file.h      |  2 ++
> > >  include/drm/drm_gem.h       |  2 ++
> > >  4 files changed, 22 insertions(+)
>
> ...
>
> > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > index 0b37506..47460d1 100644
> > > --- a/include/drm/drm_gem.h
> > > +++ b/include/drm/drm_gem.h
> > > @@ -310,6 +310,8 @@ struct drm_gem_object {
> > >      *
> > >      */
> > >     const struct drm_gem_object_funcs *funcs;
> > > +
> > > +   struct page *dummy_page;
> > >  };
> >
> > I think amdgpu doesn't care, but everyone else still might care somewhat
> > about flink. That also shares buffers, so also needs to allocate the
> > per-bo dummy page.
>
> Do you really care about making flink not explode on device
> hot-unplug? Why not just leave flink users die in a fire?
> It's not a regression.

It's not about exploding, they won't. With flink you can pass a buffer
from one address space to the other, so imo we should avoid false
sharing. E.g. if you happen to write something $secret into a private
buffer, but only $non-secret stuff into shared buffers. Then if you
unplug, your well-kept $secret might suddenly be visible by lots of
other processes you never intended to share it with.

Just feels safer to plug that hole completely.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22 14:24       ` Daniel Vetter
@ 2020-06-22 14:28         ` Pekka Paalanen
  0 siblings, 0 replies; 97+ messages in thread
From: Pekka Paalanen @ 2020-06-22 14:28 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Andrey Grodzovsky, Christian König, Michel Dänzer,
	dri-devel, amd-gfx list, Alex Deucher


[-- Attachment #1.1: Type: text/plain, Size: 2066 bytes --]

On Mon, 22 Jun 2020 16:24:38 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Mon, Jun 22, 2020 at 4:22 PM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Mon, 22 Jun 2020 11:35:01 +0200
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> >  
> > > On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:  
> > > > Will be used to reroute CPU mapped BO's page faults once
> > > > device is removed.
> > > >
> > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > ---
> > > >  drivers/gpu/drm/drm_file.c  |  8 ++++++++
> > > >  drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> > > >  include/drm/drm_file.h      |  2 ++
> > > >  include/drm/drm_gem.h       |  2 ++
> > > >  4 files changed, 22 insertions(+)  
> >
> > ...
> >  
> > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > index 0b37506..47460d1 100644
> > > > --- a/include/drm/drm_gem.h
> > > > +++ b/include/drm/drm_gem.h
> > > > @@ -310,6 +310,8 @@ struct drm_gem_object {
> > > >      *
> > > >      */
> > > >     const struct drm_gem_object_funcs *funcs;
> > > > +
> > > > +   struct page *dummy_page;
> > > >  };  
> > >
> > > I think amdgpu doesn't care, but everyone else still might care somewhat
> > > about flink. That also shares buffers, so also needs to allocate the
> > > per-bo dummy page.  
> >
> > Do you really care about making flink not explode on device
> > hot-unplug? Why not just leave flink users die in a fire?
> > It's not a regression.  
> 
> It's not about exploding, they won't. With flink you can pass a buffer
> from one address space to the other, so imo we should avoid false
> sharing. E.g. if you happen to write something $secret into a private
> buffer, but only $non-secret stuff into shared buffers. Then if you
> unplug, your well-kept $secret might suddenly be visible by lots of
> other processes you never intended to share it with.
> 
> Just feels safer to plug that hole completely.

Ah! Ok, I clearly didn't understand the consequences.


Thanks,
pq

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22 13:18   ` Christian König
  2020-06-22 14:23     ` Daniel Vetter
@ 2020-06-22 14:32     ` Andrey Grodzovsky
  2020-06-22 17:45       ` Christian König
  1 sibling, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-22 14:32 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen


On 6/22/20 9:18 AM, Christian König wrote:
> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>> Will be used to reroute CPU mapped BO's page faults once
>> device is removed.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>   include/drm/drm_file.h      |  2 ++
>>   include/drm/drm_gem.h       |  2 ++
>>   4 files changed, 22 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index c4c704e..67c0770 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor 
>> *minor)
>>               goto out_prime_destroy;
>>       }
>>   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +    if (!file->dummy_page) {
>> +        ret = -ENOMEM;
>> +        goto out_prime_destroy;
>> +    }
>> +
>>       return file;
>>     out_prime_destroy:
>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>       if (dev->driver->postclose)
>>           dev->driver->postclose(dev, file);
>>   +    __free_page(file->dummy_page);
>> +
>>       drm_prime_destroy_file_private(&file->prime);
>>         WARN_ON(!list_empty(&file->event_list));
>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>> index 1de2cde..c482e9c 100644
>> --- a/drivers/gpu/drm/drm_prime.c
>> +++ b/drivers/gpu/drm/drm_prime.c
>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device 
>> *dev,
>>         ret = drm_prime_add_buf_handle(&file_priv->prime,
>>               dma_buf, *handle);
>> +
>> +    if (!ret) {
>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +        if (!obj->dummy_page)
>> +            ret = -ENOMEM;
>> +    }
>> +
>
> While the per file case still looks acceptable this is a clear NAK 
> since it will massively increase the memory needed for a prime 
> exported object.
>
> I think that this is quite overkill in the first place and for the hot 
> unplug case we can just use the global dummy page as well.
>
> Christian.


Global dummy page is good for read access, what do you do on write 
access ? My first approach was indeed to map at first global dummy page 
as read only and mark the vma->vm_flags as !VM_SHARED assuming that this 
would trigger Copy On Write flow in core mm 
(https://elixir.bootlin.com/linux/v5.7-rc7/source/mm/memory.c#L3977) on 
the next page fault to same address triggered by a write access but then 
i realized a new COW page will be allocated for each such mapping and 
this is much more wasteful then having a dedicated page per GEM object. 
We can indeed optimize by allocating this dummy page on the first page 
fault after device disconnect instead on GEM object creation.

Andrey


>
>> mutex_unlock(&file_priv->prime.lock);
>>       if (ret)
>>           goto fail;
>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct 
>> drm_gem_object *obj, struct sg_table *sg)
>>           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>       dma_buf = attach->dmabuf;
>>       dma_buf_detach(attach->dmabuf, attach);
>> +
>> +    __free_page(obj->dummy_page);
>> +
>>       /* remove the reference */
>>       dma_buf_put(dma_buf);
>>   }
>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>> index 19df802..349a658 100644
>> --- a/include/drm/drm_file.h
>> +++ b/include/drm/drm_file.h
>> @@ -335,6 +335,8 @@ struct drm_file {
>>        */
>>       struct drm_prime_file_private prime;
>>   +    struct page *dummy_page;
>> +
>>       /* private: */
>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>       unsigned long lock_count; /* DRI1 legacy lock count */
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index 0b37506..47460d1 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -310,6 +310,8 @@ struct drm_gem_object {
>>        *
>>        */
>>       const struct drm_gem_object_funcs *funcs;
>> +
>> +    struct page *dummy_page;
>>   };
>>     /**
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-22 11:21     ` Greg KH
@ 2020-06-22 16:07       ` Andrey Grodzovsky
  2020-06-22 16:45         ` Greg KH
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-22 16:07 UTC (permalink / raw)
  To: Greg KH, Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

[-- Attachment #1: Type: text/plain, Size: 11612 bytes --]


On 6/22/20 7:21 AM, Greg KH wrote:
> On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
>> On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
>>> Track sysfs files in a list so they all can be removed during pci remove
>>> since otherwise their removal after that causes crash because parent
>>> folder was already removed during pci remove.
> Huh?  That should not happen, do you have a backtrace of that crash?


2 examples in the attached trace.

Andrey


>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> Uh I thought sysfs just gets yanked completely. Please check with Greg KH
>> whether hand-rolling all this really is the right solution here ... Feels
>> very wrong. I thought this was all supposed to work by adding attributes
>> before publishing the sysfs node, and then letting sysfs clean up
>> everything. Not by cleaning up manually yourself.
> Yes, that is supposed to be the correct thing to do.
>
>> Adding Greg for an authoritative answer.
>> -Daniel
>>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h          | 13 +++++++++++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c |  7 +++++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c   | 35 ++++++++++++++++++++++++----
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 12 ++++++----
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c      |  8 ++++++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 ++++++++++++--
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c     | 13 ++++++++++-
>>>   drivers/gpu/drm/amd/amdgpu/df_v3_6.c         | 10 +++++---
>>>   8 files changed, 99 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 604a681..ba3775f 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -726,6 +726,15 @@ struct amd_powerplay {
>>>   
>>>   #define AMDGPU_RESET_MAGIC_NUM 64
>>>   #define AMDGPU_MAX_DF_PERFMONS 4
>>> +
>>> +struct amdgpu_sysfs_list_node {
>>> +	struct list_head head;
>>> +	struct device_attribute *attr;
>>> +};
> You know we have lists of attributes already, called attribute groups,
> if you really wanted to do something like this.  But, I don't think so.
>
> Either way, don't hand-roll your own stuff that the driver core has
> provided for you for a decade or more, that's just foolish :)
>
>>> +
>>> +#define AMDGPU_DEVICE_ATTR_LIST_NODE(_attr) \
>>> +	struct amdgpu_sysfs_list_node dev_attr_handle_##_attr = {.attr = &dev_attr_##_attr}
>>> +
>>>   struct amdgpu_device {
>>>   	struct device			*dev;
>>>   	struct drm_device		*ddev;
>>> @@ -992,6 +1001,10 @@ struct amdgpu_device {
>>>   	char				product_number[16];
>>>   	char				product_name[32];
>>>   	char				serial[16];
>>> +
>>> +	struct list_head sysfs_files_list;
>>> +	struct mutex	 sysfs_files_list_lock;
>>> +
>>>   };
>>>   
>>>   static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
>>> index fdd52d8..c1549ee 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
>>> @@ -1950,8 +1950,10 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct device *dev,
>>>   	return snprintf(buf, PAGE_SIZE, "%s\n", ctx->vbios_version);
>>>   }
>>>   
>>> +
>>>   static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version,
>>>   		   NULL);
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(vbios_version);
>>>   
>>>   /**
>>>    * amdgpu_atombios_fini - free the driver info and callbacks for atombios
>>> @@ -1972,7 +1974,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev)
>>>   	adev->mode_info.atom_context = NULL;
>>>   	kfree(adev->mode_info.atom_card_info);
>>>   	adev->mode_info.atom_card_info = NULL;
>>> -	device_remove_file(adev->dev, &dev_attr_vbios_version);
>>>   }
>>>   
>>>   /**
>>> @@ -2038,6 +2039,10 @@ int amdgpu_atombios_init(struct amdgpu_device *adev)
>>>   		return ret;
>>>   	}
>>>   
>>> +	mutex_lock(&adev->sysfs_files_list_lock);
>>> +	list_add_tail(&dev_attr_handle_vbios_version.head, &adev->sysfs_files_list);
>>> +	mutex_unlock(&adev->sysfs_files_list_lock);
>>> +
>>>   	return 0;
>>>   }
>>>   
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> index e7b9065..3173046 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>> @@ -2928,6 +2928,12 @@ static const struct attribute *amdgpu_dev_attributes[] = {
>>>   	NULL
>>>   };
>>>   
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_name);
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(product_number);
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(serial_number);
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(pcie_replay_count);
>>> +
>>> +
>>>   /**
>>>    * amdgpu_device_init - initialize the driver
>>>    *
>>> @@ -3029,6 +3035,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>   	INIT_LIST_HEAD(&adev->shadow_list);
>>>   	mutex_init(&adev->shadow_list_lock);
>>>   
>>> +	INIT_LIST_HEAD(&adev->sysfs_files_list);
>>> +	mutex_init(&adev->sysfs_files_list_lock);
>>> +
>>>   	INIT_DELAYED_WORK(&adev->delayed_init_work,
>>>   			  amdgpu_device_delayed_init_work_handler);
>>>   	INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
>>> @@ -3281,6 +3290,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>   	if (r) {
>>>   		dev_err(adev->dev, "Could not create amdgpu device attr\n");
>>>   		return r;
>>> +	} else {
>>> +		mutex_lock(&adev->sysfs_files_list_lock);
>>> +		list_add_tail(&dev_attr_handle_product_name.head, &adev->sysfs_files_list);
>>> +		list_add_tail(&dev_attr_handle_product_number.head, &adev->sysfs_files_list);
>>> +		list_add_tail(&dev_attr_handle_serial_number.head, &adev->sysfs_files_list);
>>> +		list_add_tail(&dev_attr_handle_pcie_replay_count.head, &adev->sysfs_files_list);
>>> +		mutex_unlock(&adev->sysfs_files_list_lock);
>>>   	}
>>>   
>>>   	if (IS_ENABLED(CONFIG_PERF_EVENTS))
>>> @@ -3298,6 +3314,16 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>   	return r;
>>>   }
>>>   
>>> +static void amdgpu_sysfs_remove_files(struct amdgpu_device *adev)
>>> +{
>>> +	struct amdgpu_sysfs_list_node *node;
>>> +
>>> +	mutex_lock(&adev->sysfs_files_list_lock);
>>> +	list_for_each_entry(node, &adev->sysfs_files_list, head)
>>> +		device_remove_file(adev->dev, node->attr);
>>> +	mutex_unlock(&adev->sysfs_files_list_lock);
>>> +}
>>> +
>>>   /**
>>>    * amdgpu_device_fini - tear down the driver
>>>    *
>>> @@ -3332,6 +3358,11 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev)
>>>   	amdgpu_fbdev_fini(adev);
>>>   
>>>   	amdgpu_irq_fini_early(adev);
>>> +
>>> +	amdgpu_sysfs_remove_files(adev);
>>> +
>>> +	if (adev->ucode_sysfs_en)
>>> +		amdgpu_ucode_sysfs_fini(adev);
>>>   }
>>>   
>>>   void amdgpu_device_fini_late(struct amdgpu_device *adev)
>>> @@ -3366,10 +3397,6 @@ void amdgpu_device_fini_late(struct amdgpu_device *adev)
>>>   	adev->rmmio = NULL;
>>>   	amdgpu_device_doorbell_fini(adev);
>>>   
>>> -	if (adev->ucode_sysfs_en)
>>> -		amdgpu_ucode_sysfs_fini(adev);
>>> -
>>> -	sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
>>>   	if (IS_ENABLED(CONFIG_PERF_EVENTS))
>>>   		amdgpu_pmu_fini(adev);
>>>   	if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
>>> index 6271044..e7b6c4a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
>>> @@ -76,6 +76,9 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
>>>   static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO,
>>>   	           amdgpu_mem_info_gtt_used_show, NULL);
>>>   
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_total);
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_gtt_used);
>>> +
>>>   /**
>>>    * amdgpu_gtt_mgr_init - init GTT manager and DRM MM
>>>    *
>>> @@ -114,6 +117,11 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
>>>   		return ret;
>>>   	}
>>>   
>>> +	mutex_lock(&adev->sysfs_files_list_lock);
>>> +	list_add_tail(&dev_attr_handle_mem_info_gtt_total.head, &adev->sysfs_files_list);
>>> +	list_add_tail(&dev_attr_handle_mem_info_gtt_used.head, &adev->sysfs_files_list);
>>> +	mutex_unlock(&adev->sysfs_files_list_lock);
>>> +
>>>   	return 0;
>>>   }
>>>   
>>> @@ -127,7 +135,6 @@ static int amdgpu_gtt_mgr_init(struct ttm_mem_type_manager *man,
>>>    */
>>>   static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
>>>   {
>>> -	struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
>>>   	struct amdgpu_gtt_mgr *mgr = man->priv;
>>>   	spin_lock(&mgr->lock);
>>>   	drm_mm_takedown(&mgr->mm);
>>> @@ -135,9 +142,6 @@ static int amdgpu_gtt_mgr_fini(struct ttm_mem_type_manager *man)
>>>   	kfree(mgr);
>>>   	man->priv = NULL;
>>>   
>>> -	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_total);
>>> -	device_remove_file(adev->dev, &dev_attr_mem_info_gtt_used);
>>> -
>>>   	return 0;
>>>   }
>>>   
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> index ddb4af0c..554fec0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
>>> @@ -2216,6 +2216,8 @@ static DEVICE_ATTR(usbc_pd_fw, S_IRUGO | S_IWUSR,
>>>   		   psp_usbc_pd_fw_sysfs_read,
>>>   		   psp_usbc_pd_fw_sysfs_write);
>>>   
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(usbc_pd_fw);
>>> +
>>>   
>>>   
>>>   const struct amd_ip_funcs psp_ip_funcs = {
>>> @@ -2242,13 +2244,17 @@ static int psp_sysfs_init(struct amdgpu_device *adev)
>>>   
>>>   	if (ret)
>>>   		DRM_ERROR("Failed to create USBC PD FW control file!");
>>> +	else {
>>> +		mutex_lock(&adev->sysfs_files_list_lock);
>>> +		list_add_tail(&dev_attr_handle_usbc_pd_fw.head, &adev->sysfs_files_list);
>>> +		mutex_unlock(&adev->sysfs_files_list_lock);
>>> +	}
>>>   
>>>   	return ret;
>>>   }
>>>   
>>>   static void psp_sysfs_fini(struct amdgpu_device *adev)
>>>   {
>>> -	device_remove_file(adev->dev, &dev_attr_usbc_pd_fw);
>>>   }
>>>   
>>>   const struct amdgpu_ip_block_version psp_v3_1_ip_block =
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> index 7723937..39c400c 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> @@ -148,6 +148,12 @@ static DEVICE_ATTR(mem_info_vis_vram_used, S_IRUGO,
>>>   static DEVICE_ATTR(mem_info_vram_vendor, S_IRUGO,
>>>   		   amdgpu_mem_info_vram_vendor, NULL);
>>>   
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_total);
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_total);
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_used);
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vis_vram_used);
>>> +static AMDGPU_DEVICE_ATTR_LIST_NODE(mem_info_vram_vendor);
> Converting all of these individual attributes to an attribute group
> would be a nice thing to do anyway.  Makes your logic much simpler and
> less error-prone.
>
> But again, the driver core should do all of the device file removal
> stuff automatically for you when your PCI device is removed from the
> system _UNLESS_ you are doing crazy things like creating child devices
> or messing with raw kobjects or other horrible things that I haven't
> read the code to see if you are, but hopefully not :)
>
> thanks,
>
> greg k-h

[-- Attachment #2: sysfs_oops-1.log --]
[-- Type: text/x-log, Size: 7179 bytes --]

[  925.738225 <    0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090
[  925.738232 <    0.000007>] #PF: supervisor read access in kernel mode
[  925.738236 <    0.000004>] #PF: error_code(0x0000) - not-present page
[  925.738240 <    0.000004>] PGD 0 P4D 0 
[  925.738245 <    0.000005>] Oops: 0000 [#1] SMP PTI
[  925.738249 <    0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
[  925.738256 <    0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
[  925.738266 <    0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110
[  925.738270 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
[  925.738282 <    0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246
[  925.738287 <    0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
[  925.738292 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000
[  925.738297 <    0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000
[  925.738302 <    0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000
[  925.738307 <    0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130
[  925.738313 <    0.000006>] FS:  00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000
[  925.738319 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  925.738323 <    0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0
[  925.738329 <    0.000006>] Call Trace:
[  925.738334 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
[  925.738339 <    0.000005>]  sysfs_remove_group+0x25/0x80
[  925.738344 <    0.000005>]  sysfs_remove_groups+0x29/0x40
[  925.738350 <    0.000006>]  free_msi_irqs+0xf5/0x190
[  925.738354 <    0.000004>]  pci_disable_msi+0xe9/0x120
[  925.738406 <    0.000052>]  amdgpu_irq_fini+0xe3/0xf0 [amdgpu]
[  925.738453 <    0.000047>]  tonga_ih_sw_fini+0xe/0x30 [amdgpu]
[  925.738490 <    0.000037>]  amdgpu_device_fini_late+0x14b/0x440 [amdgpu]
[  925.738529 <    0.000039>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
[  925.738548 <    0.000019>]  drm_dev_put+0x5b/0x80 [drm]
[  925.738558 <    0.000010>]  drm_release+0xc6/0xd0 [drm]
[  925.738563 <    0.000005>]  __fput+0xc6/0x260
[  925.738568 <    0.000005>]  task_work_run+0x79/0xb0
[  925.738573 <    0.000005>]  do_exit+0x3d0/0xc60
[  925.738578 <    0.000005>]  do_group_exit+0x47/0xb0
[  925.738583 <    0.000005>]  get_signal+0x18b/0xc30
[  925.738589 <    0.000006>]  do_signal+0x36/0x6a0
[  925.738593 <    0.000004>]  ? force_sig_info_to_task+0xbc/0xd0
[  925.738597 <    0.000004>]  ? signal_wake_up_state+0x15/0x30
[  925.738603 <    0.000006>]  exit_to_usermode_loop+0x6f/0xc0
[  925.738608 <    0.000005>]  prepare_exit_to_usermode+0xc7/0x110
[  925.738613 <    0.000005>]  ret_from_intr+0x25/0x35
[  925.738617 <    0.000004>] RIP: 0033:0x417369
[  925.738621 <    0.000004>] Code: Bad RIP value.
[  925.738625 <    0.000004>] RSP: 002b:00007ffdd6bf0900 EFLAGS: 00010246
[  925.738629 <    0.000004>] RAX: 00007f3eca509000 RBX: 000000000000001e RCX: 00007f3ec95ba260
[  925.738634 <    0.000005>] RDX: 00007f3ec9889790 RSI: 000000000000000a RDI: 0000000000000000
[  925.738639 <    0.000005>] RBP: 00007ffdd6bf0990 R08: 00007f3ec9889780 R09: 00007f3eca4e8700
[  925.738645 <    0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000021c6170
[  925.738650 <    0.000005>] R13: 00007ffdd6bf0c00 R14: 0000000000000000 R15: 0000000000000000




[   40.880899 <    0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000090
[   40.880906 <    0.000007>] #PF: supervisor read access in kernel mode
[   40.880910 <    0.000004>] #PF: error_code(0x0000) - not-present page
[   40.880915 <    0.000005>] PGD 0 P4D 0 
[   40.880920 <    0.000005>] Oops: 0000 [#1] SMP PTI
[   40.880924 <    0.000004>] CPU: 1 PID: 2526 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
[   40.880932 <    0.000008>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
[   40.880941 <    0.000009>] RIP: 0010:kernfs_find_ns+0x18/0x110
[   40.880945 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
[   40.880957 <    0.000012>] RSP: 0018:ffffaf3380467ba8 EFLAGS: 00010246
[   40.880963 <    0.000006>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
[   40.880968 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffc0678cfc RDI: 0000000000000000
[   40.880973 <    0.000005>] RBP: ffffffffc0678cfc R08: ffffffffaa379d10 R09: 0000000000000000
[   40.880979 <    0.000006>] R10: ffffaf3380467be0 R11: ffff93547615d128 R12: 0000000000000000
[   40.880984 <    0.000005>] R13: 0000000000000000 R14: ffffffffc0678cfc R15: ffff93549be86130
[   40.880990 <    0.000006>] FS:  00007fd9ecb10700(0000) GS:ffff9354bd840000(0000) knlGS:0000000000000000
[   40.880996 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   40.881001 <    0.000005>] CR2: 0000000000000090 CR3: 0000000072866001 CR4: 00000000000606e0
[   40.881006 <    0.000005>] Call Trace:
[   40.881011 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
[   40.881016 <    0.000005>]  sysfs_remove_group+0x25/0x80
[   40.881055 <    0.000039>]  amdgpu_device_fini_late+0x3eb/0x440 [amdgpu]
[   40.881095 <    0.000040>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
[   40.881109 <    0.000014>]  drm_dev_put+0x5b/0x80 [drm]
[   40.881119 <    0.000010>]  drm_release+0xc6/0xd0 [drm]
[   40.881124 <    0.000005>]  __fput+0xc6/0x260
[   40.881129 <    0.000005>]  task_work_run+0x79/0xb0
[   40.881134 <    0.000005>]  do_exit+0x3d0/0xc60
[   40.881138 <    0.000004>]  do_group_exit+0x47/0xb0
[   40.881143 <    0.000005>]  get_signal+0x18b/0xc30
[   40.881149 <    0.000006>]  do_signal+0x36/0x6a0
[   40.881153 <    0.000004>]  ? force_sig_info_to_task+0xbc/0xd0
[   40.881158 <    0.000005>]  ? signal_wake_up_state+0x15/0x30
[   40.881164 <    0.000006>]  exit_to_usermode_loop+0x6f/0xc0
[   40.881170 <    0.000006>]  prepare_exit_to_usermode+0xc7/0x110
[   40.881176 <    0.000006>]  ret_from_intr+0x25/0x35
[   40.881181 <    0.000005>] RIP: 0033:0x417369
[   40.881185 <    0.000004>] Code: Bad RIP value.
[   40.881188 <    0.000003>] RSP: 002b:00007ffd6a742f90 EFLAGS: 00010246
[   40.881193 <    0.000005>] RAX: 00007fd9ecb31000 RBX: 000000000000001e RCX: 00007fd9ebbe2260
[   40.881199 <    0.000006>] RDX: 00007fd9ebeb1790 RSI: 000000000000000a RDI: 0000000000000000
[   40.881204 <    0.000005>] RBP: 00007ffd6a743020 R08: 00007fd9ebeb1780 R09: 00007fd9ecb10700
[   40.881210 <    0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000023e0170
[   40.881215 <    0.000005>] R13: 00007ffd6a743290 R14: 0000000000000000 R15: 0000000000000000



[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-22 16:07       ` Andrey Grodzovsky
@ 2020-06-22 16:45         ` Greg KH
  2020-06-23  4:51           ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Greg KH @ 2020-06-22 16:45 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
> 
> On 6/22/20 7:21 AM, Greg KH wrote:
> > On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
> > > On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
> > > > Track sysfs files in a list so they all can be removed during pci remove
> > > > since otherwise their removal after that causes crash because parent
> > > > folder was already removed during pci remove.
> > Huh?  That should not happen, do you have a backtrace of that crash?
> 
> 
> 2 examples in the attached trace.

Odd, how did you trigger these?


> [  925.738225 <    0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090
> [  925.738232 <    0.000007>] #PF: supervisor read access in kernel mode
> [  925.738236 <    0.000004>] #PF: error_code(0x0000) - not-present page
> [  925.738240 <    0.000004>] PGD 0 P4D 0 
> [  925.738245 <    0.000005>] Oops: 0000 [#1] SMP PTI
> [  925.738249 <    0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
> [  925.738256 <    0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
> [  925.738266 <    0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110
> [  925.738270 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
> [  925.738282 <    0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246
> [  925.738287 <    0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
> [  925.738292 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000
> [  925.738297 <    0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000
> [  925.738302 <    0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000
> [  925.738307 <    0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130
> [  925.738313 <    0.000006>] FS:  00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000
> [  925.738319 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  925.738323 <    0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0
> [  925.738329 <    0.000006>] Call Trace:
> [  925.738334 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
> [  925.738339 <    0.000005>]  sysfs_remove_group+0x25/0x80
> [  925.738344 <    0.000005>]  sysfs_remove_groups+0x29/0x40
> [  925.738350 <    0.000006>]  free_msi_irqs+0xf5/0x190
> [  925.738354 <    0.000004>]  pci_disable_msi+0xe9/0x120

So the PCI core is trying to clean up attributes that it had registered,
which is fine.  But we can't seem to find the attributes?  Were they
already removed somewhere else?

that's odd.

> [  925.738406 <    0.000052>]  amdgpu_irq_fini+0xe3/0xf0 [amdgpu]
> [  925.738453 <    0.000047>]  tonga_ih_sw_fini+0xe/0x30 [amdgpu]
> [  925.738490 <    0.000037>]  amdgpu_device_fini_late+0x14b/0x440 [amdgpu]
> [  925.738529 <    0.000039>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
> [  925.738548 <    0.000019>]  drm_dev_put+0x5b/0x80 [drm]
> [  925.738558 <    0.000010>]  drm_release+0xc6/0xd0 [drm]
> [  925.738563 <    0.000005>]  __fput+0xc6/0x260
> [  925.738568 <    0.000005>]  task_work_run+0x79/0xb0
> [  925.738573 <    0.000005>]  do_exit+0x3d0/0xc60
> [  925.738578 <    0.000005>]  do_group_exit+0x47/0xb0
> [  925.738583 <    0.000005>]  get_signal+0x18b/0xc30
> [  925.738589 <    0.000006>]  do_signal+0x36/0x6a0
> [  925.738593 <    0.000004>]  ? force_sig_info_to_task+0xbc/0xd0
> [  925.738597 <    0.000004>]  ? signal_wake_up_state+0x15/0x30
> [  925.738603 <    0.000006>]  exit_to_usermode_loop+0x6f/0xc0
> [  925.738608 <    0.000005>]  prepare_exit_to_usermode+0xc7/0x110
> [  925.738613 <    0.000005>]  ret_from_intr+0x25/0x35
> [  925.738617 <    0.000004>] RIP: 0033:0x417369
> [  925.738621 <    0.000004>] Code: Bad RIP value.
> [  925.738625 <    0.000004>] RSP: 002b:00007ffdd6bf0900 EFLAGS: 00010246
> [  925.738629 <    0.000004>] RAX: 00007f3eca509000 RBX: 000000000000001e RCX: 00007f3ec95ba260
> [  925.738634 <    0.000005>] RDX: 00007f3ec9889790 RSI: 000000000000000a RDI: 0000000000000000
> [  925.738639 <    0.000005>] RBP: 00007ffdd6bf0990 R08: 00007f3ec9889780 R09: 00007f3eca4e8700
> [  925.738645 <    0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000021c6170
> [  925.738650 <    0.000005>] R13: 00007ffdd6bf0c00 R14: 0000000000000000 R15: 0000000000000000
> 
> 
> 
> 
> [   40.880899 <    0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000090
> [   40.880906 <    0.000007>] #PF: supervisor read access in kernel mode
> [   40.880910 <    0.000004>] #PF: error_code(0x0000) - not-present page
> [   40.880915 <    0.000005>] PGD 0 P4D 0 
> [   40.880920 <    0.000005>] Oops: 0000 [#1] SMP PTI
> [   40.880924 <    0.000004>] CPU: 1 PID: 2526 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
> [   40.880932 <    0.000008>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
> [   40.880941 <    0.000009>] RIP: 0010:kernfs_find_ns+0x18/0x110
> [   40.880945 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
> [   40.880957 <    0.000012>] RSP: 0018:ffffaf3380467ba8 EFLAGS: 00010246
> [   40.880963 <    0.000006>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
> [   40.880968 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffc0678cfc RDI: 0000000000000000
> [   40.880973 <    0.000005>] RBP: ffffffffc0678cfc R08: ffffffffaa379d10 R09: 0000000000000000
> [   40.880979 <    0.000006>] R10: ffffaf3380467be0 R11: ffff93547615d128 R12: 0000000000000000
> [   40.880984 <    0.000005>] R13: 0000000000000000 R14: ffffffffc0678cfc R15: ffff93549be86130
> [   40.880990 <    0.000006>] FS:  00007fd9ecb10700(0000) GS:ffff9354bd840000(0000) knlGS:0000000000000000
> [   40.880996 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   40.881001 <    0.000005>] CR2: 0000000000000090 CR3: 0000000072866001 CR4: 00000000000606e0
> [   40.881006 <    0.000005>] Call Trace:
> [   40.881011 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
> [   40.881016 <    0.000005>]  sysfs_remove_group+0x25/0x80
> [   40.881055 <    0.000039>]  amdgpu_device_fini_late+0x3eb/0x440 [amdgpu]
> [   40.881095 <    0.000040>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]

Here is this is your driver doing the same thing, removing attributes it
created.  But again they are not there.

So something went through and wiped the tree clean, which if I'm reading
this correctly, your patch would not solve as you would try to also
remove attributes that were already removed, right?

And 5.5-rc7 is a bit old (6 months and many thousands of changes ago),
does this still happen on a modern, released, kernel?

thanks,

greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22 14:32     ` Andrey Grodzovsky
@ 2020-06-22 17:45       ` Christian König
  2020-06-22 17:50         ` Daniel Vetter
  0 siblings, 1 reply; 97+ messages in thread
From: Christian König @ 2020-06-22 17:45 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen

Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>
> On 6/22/20 9:18 AM, Christian König wrote:
>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>> Will be used to reroute CPU mapped BO's page faults once
>>> device is removed.
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>   include/drm/drm_file.h      |  2 ++
>>>   include/drm/drm_gem.h       |  2 ++
>>>   4 files changed, 22 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>> index c4c704e..67c0770 100644
>>> --- a/drivers/gpu/drm/drm_file.c
>>> +++ b/drivers/gpu/drm/drm_file.c
>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct 
>>> drm_minor *minor)
>>>               goto out_prime_destroy;
>>>       }
>>>   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +    if (!file->dummy_page) {
>>> +        ret = -ENOMEM;
>>> +        goto out_prime_destroy;
>>> +    }
>>> +
>>>       return file;
>>>     out_prime_destroy:
>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>       if (dev->driver->postclose)
>>>           dev->driver->postclose(dev, file);
>>>   +    __free_page(file->dummy_page);
>>> +
>>>       drm_prime_destroy_file_private(&file->prime);
>>>         WARN_ON(!list_empty(&file->event_list));
>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>> index 1de2cde..c482e9c 100644
>>> --- a/drivers/gpu/drm/drm_prime.c
>>> +++ b/drivers/gpu/drm/drm_prime.c
>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct 
>>> drm_device *dev,
>>>         ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>               dma_buf, *handle);
>>> +
>>> +    if (!ret) {
>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +        if (!obj->dummy_page)
>>> +            ret = -ENOMEM;
>>> +    }
>>> +
>>
>> While the per file case still looks acceptable this is a clear NAK 
>> since it will massively increase the memory needed for a prime 
>> exported object.
>>
>> I think that this is quite overkill in the first place and for the 
>> hot unplug case we can just use the global dummy page as well.
>>
>> Christian.
>
>
> Global dummy page is good for read access, what do you do on write 
> access ? My first approach was indeed to map at first global dummy 
> page as read only and mark the vma->vm_flags as !VM_SHARED assuming 
> that this would trigger Copy On Write flow in core mm 
> (https://elixir.bootlin.com/linux/v5.7-rc7/source/mm/memory.c#L3977) 
> on the next page fault to same address triggered by a write access but 
> then i realized a new COW page will be allocated for each such mapping 
> and this is much more wasteful then having a dedicated page per GEM 
> object. 

Yeah, but this is only for a very very small corner cases. What we need 
to prevent is increasing the memory usage during normal operation to much.

Using memory during the unplug is completely unproblematic because we 
just released quite a bunch of it by releasing all those system memory 
buffers.

And I'm pretty sure that COWed pages are correctly accounted towards the 
used memory of a process.

So I think if that approach works as intended and the COW pages are 
released again on unmapping it would be the perfect solution to the problem.

Daniel what do you think?

Regards,
Christian.

> We can indeed optimize by allocating this dummy page on the first page 
> fault after device disconnect instead on GEM object creation.
>
> Andrey
>
>
>>
>>> mutex_unlock(&file_priv->prime.lock);
>>>       if (ret)
>>>           goto fail;
>>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct 
>>> drm_gem_object *obj, struct sg_table *sg)
>>>           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>       dma_buf = attach->dmabuf;
>>>       dma_buf_detach(attach->dmabuf, attach);
>>> +
>>> +    __free_page(obj->dummy_page);
>>> +
>>>       /* remove the reference */
>>>       dma_buf_put(dma_buf);
>>>   }
>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>> index 19df802..349a658 100644
>>> --- a/include/drm/drm_file.h
>>> +++ b/include/drm/drm_file.h
>>> @@ -335,6 +335,8 @@ struct drm_file {
>>>        */
>>>       struct drm_prime_file_private prime;
>>>   +    struct page *dummy_page;
>>> +
>>>       /* private: */
>>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>       unsigned long lock_count; /* DRI1 legacy lock count */
>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>> index 0b37506..47460d1 100644
>>> --- a/include/drm/drm_gem.h
>>> +++ b/include/drm/drm_gem.h
>>> @@ -310,6 +310,8 @@ struct drm_gem_object {
>>>        *
>>>        */
>>>       const struct drm_gem_object_funcs *funcs;
>>> +
>>> +    struct page *dummy_page;
>>>   };
>>>     /**
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22 17:45       ` Christian König
@ 2020-06-22 17:50         ` Daniel Vetter
  2020-11-09 20:53           ` Andrey Grodzovsky
  2020-11-13 20:52           ` Andrey Grodzovsky
  0 siblings, 2 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-06-22 17:50 UTC (permalink / raw)
  To: Christian König
  Cc: Andrey Grodzovsky, Michel Dänzer, dri-devel, Pekka Paalanen,
	amd-gfx list, Alex Deucher

On Mon, Jun 22, 2020 at 7:45 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
> >
> > On 6/22/20 9:18 AM, Christian König wrote:
> >> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> >>> Will be used to reroute CPU mapped BO's page faults once
> >>> device is removed.
> >>>
> >>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>> ---
> >>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
> >>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> >>>   include/drm/drm_file.h      |  2 ++
> >>>   include/drm/drm_gem.h       |  2 ++
> >>>   4 files changed, 22 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >>> index c4c704e..67c0770 100644
> >>> --- a/drivers/gpu/drm/drm_file.c
> >>> +++ b/drivers/gpu/drm/drm_file.c
> >>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
> >>> drm_minor *minor)
> >>>               goto out_prime_destroy;
> >>>       }
> >>>   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> >>> +    if (!file->dummy_page) {
> >>> +        ret = -ENOMEM;
> >>> +        goto out_prime_destroy;
> >>> +    }
> >>> +
> >>>       return file;
> >>>     out_prime_destroy:
> >>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
> >>>       if (dev->driver->postclose)
> >>>           dev->driver->postclose(dev, file);
> >>>   +    __free_page(file->dummy_page);
> >>> +
> >>>       drm_prime_destroy_file_private(&file->prime);
> >>>         WARN_ON(!list_empty(&file->event_list));
> >>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> >>> index 1de2cde..c482e9c 100644
> >>> --- a/drivers/gpu/drm/drm_prime.c
> >>> +++ b/drivers/gpu/drm/drm_prime.c
> >>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
> >>> drm_device *dev,
> >>>         ret = drm_prime_add_buf_handle(&file_priv->prime,
> >>>               dma_buf, *handle);
> >>> +
> >>> +    if (!ret) {
> >>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> >>> +        if (!obj->dummy_page)
> >>> +            ret = -ENOMEM;
> >>> +    }
> >>> +
> >>
> >> While the per file case still looks acceptable this is a clear NAK
> >> since it will massively increase the memory needed for a prime
> >> exported object.
> >>
> >> I think that this is quite overkill in the first place and for the
> >> hot unplug case we can just use the global dummy page as well.
> >>
> >> Christian.
> >
> >
> > Global dummy page is good for read access, what do you do on write
> > access ? My first approach was indeed to map at first global dummy
> > page as read only and mark the vma->vm_flags as !VM_SHARED assuming
> > that this would trigger Copy On Write flow in core mm
> > (https://elixir.bootlin.com/linux/v5.7-rc7/source/mm/memory.c#L3977)
> > on the next page fault to same address triggered by a write access but
> > then i realized a new COW page will be allocated for each such mapping
> > and this is much more wasteful then having a dedicated page per GEM
> > object.
>
> Yeah, but this is only for a very very small corner cases. What we need
> to prevent is increasing the memory usage during normal operation to much.
>
> Using memory during the unplug is completely unproblematic because we
> just released quite a bunch of it by releasing all those system memory
> buffers.
>
> And I'm pretty sure that COWed pages are correctly accounted towards the
> used memory of a process.
>
> So I think if that approach works as intended and the COW pages are
> released again on unmapping it would be the perfect solution to the problem.
>
> Daniel what do you think?

If COW works, sure sounds reasonable. And if we can make sure we
managed to drop all the system allocations (otherwise suddenly 2x
memory usage, worst case). But I have no idea whether we can
retroshoehorn that into an established vma, you might have fun stuff
like a mkwrite handler there (which I thought is the COW handler
thing, but really no idea).

If we need to massively change stuff then I think rw dummy page,
allocated on first fault after hotunplug (maybe just make it one per
object, that's simplest) seems like the much safer option. Much less
code that can go wrong.
-Daniel

> Regards,
> Christian.
>
> > We can indeed optimize by allocating this dummy page on the first page
> > fault after device disconnect instead on GEM object creation.
> >
> > Andrey
> >
> >
> >>
> >>> mutex_unlock(&file_priv->prime.lock);
> >>>       if (ret)
> >>>           goto fail;
> >>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct
> >>> drm_gem_object *obj, struct sg_table *sg)
> >>>           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
> >>>       dma_buf = attach->dmabuf;
> >>>       dma_buf_detach(attach->dmabuf, attach);
> >>> +
> >>> +    __free_page(obj->dummy_page);
> >>> +
> >>>       /* remove the reference */
> >>>       dma_buf_put(dma_buf);
> >>>   }
> >>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> >>> index 19df802..349a658 100644
> >>> --- a/include/drm/drm_file.h
> >>> +++ b/include/drm/drm_file.h
> >>> @@ -335,6 +335,8 @@ struct drm_file {
> >>>        */
> >>>       struct drm_prime_file_private prime;
> >>>   +    struct page *dummy_page;
> >>> +
> >>>       /* private: */
> >>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
> >>>       unsigned long lock_count; /* DRI1 legacy lock count */
> >>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> >>> index 0b37506..47460d1 100644
> >>> --- a/include/drm/drm_gem.h
> >>> +++ b/include/drm/drm_gem.h
> >>> @@ -310,6 +310,8 @@ struct drm_gem_object {
> >>>        *
> >>>        */
> >>>       const struct drm_gem_object_funcs *funcs;
> >>> +
> >>> +    struct page *dummy_page;
> >>>   };
> >>>     /**
> >>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page.
  2020-06-21  6:03 ` [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky
  2020-06-22  9:41   ` Daniel Vetter
@ 2020-06-22 19:30   ` Christian König
  1 sibling, 0 replies; 97+ messages in thread
From: Christian König @ 2020-06-22 19:30 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen

Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> On device removal reroute all CPU mappings to dummy page per drm_file
> instance or imported GEM object.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++-----
>   1 file changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 389128b..2f8bf5e 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -35,6 +35,8 @@
>   #include <drm/ttm/ttm_bo_driver.h>
>   #include <drm/ttm/ttm_placement.h>
>   #include <drm/drm_vma_manager.h>
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
>   #include <linux/mm.h>
>   #include <linux/pfn_t.h>
>   #include <linux/rbtree.h>
> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
>   	pgprot_t prot;
>   	struct ttm_buffer_object *bo = vma->vm_private_data;
>   	vm_fault_t ret;
> +	int idx;
> +	struct drm_device *ddev = bo->base.dev;
>   
> -	ret = ttm_bo_vm_reserve(bo, vmf);
> -	if (ret)
> -		return ret;
> +	if (drm_dev_enter(ddev, &idx)) {

Better do this like if (!drm_dev_enter(...)) return ttm_bo_vm_dummy(..);

This way you can move all the dummy fault handling into a separate 
function without cluttering this one here to much.

Christian.

> +		ret = ttm_bo_vm_reserve(bo, vmf);
> +		if (ret)
> +			goto exit;
> +
> +		prot = vma->vm_page_prot;
>   
> -	prot = vma->vm_page_prot;
> -	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
> -	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> +		ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
> +		if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> +			goto exit;
> +
> +		dma_resv_unlock(bo->base.resv);
> +
> +exit:
> +		drm_dev_exit(idx);
>   		return ret;
> +	} else {
>   
> -	dma_resv_unlock(bo->base.resv);
> +		struct drm_file *file = NULL;
> +		struct page *dummy_page = NULL;
> +		int handle;
>   
> -	return ret;
> +		/* We are faulting on imported BO from dma_buf */
> +		if (bo->base.dma_buf && bo->base.import_attach) {
> +			dummy_page = bo->base.dummy_page;
> +		/* We are faulting on non imported BO, find drm_file owning the BO*/
> +		} else {
> +			struct drm_gem_object *gobj;
> +
> +			mutex_lock(&ddev->filelist_mutex);
> +			list_for_each_entry(file, &ddev->filelist, lhead) {
> +				spin_lock(&file->table_lock);
> +				idr_for_each_entry(&file->object_idr, gobj, handle) {
> +					if (gobj == &bo->base) {
> +						dummy_page = file->dummy_page;
> +						break;
> +					}
> +				}
> +				spin_unlock(&file->table_lock);
> +			}
> +			mutex_unlock(&ddev->filelist_mutex);
> +		}
> +
> +		if (dummy_page) {
> +			/*
> +			 * Let do_fault complete the PTE install e.t.c using vmf->page
> +			 *
> +			 * TODO - should i call free_page somewhere ?
> +			 */
> +			get_page(dummy_page);
> +			vmf->page = dummy_page;
> +			return 0;
> +		} else {
> +			return VM_FAULT_SIGSEGV;
> +		}
> +	}
>   }
>   EXPORT_SYMBOL(ttm_bo_vm_fault);
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space
  2020-06-21  6:03 ` [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space Andrey Grodzovsky
  2020-06-22  9:45   ` Daniel Vetter
@ 2020-06-22 19:37   ` Christian König
  2020-06-22 19:47   ` Alex Deucher
  2 siblings, 0 replies; 97+ messages in thread
From: Christian König @ 2020-06-22 19:37 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen

Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> Helper function to be used to invalidate all BOs CPU mappings
> once device is removed.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/ttm/ttm_bo.c    | 8 ++++++--
>   include/drm/ttm/ttm_bo_driver.h | 7 +++++++
>   2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index c5b516f..926a365 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)
>   	ttm_bo_unmap_virtual_locked(bo);
>   	ttm_mem_io_unlock(man);
>   }
> -
> -
>   EXPORT_SYMBOL(ttm_bo_unmap_virtual);
>   
> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev)
> +{
> +	unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
> +}
> +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
> +
>   int ttm_bo_wait(struct ttm_buffer_object *bo,
>   		bool interruptible, bool no_wait)
>   {
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index c9e0fd0..39ea44f 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
>   void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
>   
>   /**
> + * ttm_bo_unmap_virtual_address_space
> + *
> + * @bdev: tear down all the virtual mappings for this device
> + */
> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
> +
> +/**
>    * ttm_bo_unmap_virtual
>    *
>    * @bo: tear down the virtual mappings for this BO

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove.
  2020-06-21  6:03 ` [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove Andrey Grodzovsky
  2020-06-22  9:56   ` Daniel Vetter
@ 2020-06-22 19:38   ` Christian König
  2020-06-22 19:48     ` Alex Deucher
  1 sibling, 1 reply; 97+ messages in thread
From: Christian König @ 2020-06-22 19:38 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen

Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> Use the new TTM interface to invalidate all exsisting BO CPU mappings
> form all user proccesses.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

I think those two patches could already land in amd-staging-drm-next 
since they are a good idea independent of how else we fix the other issues.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 43592dc..6932d75 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>   	struct drm_device *dev = pci_get_drvdata(pdev);
>   
>   	drm_dev_unplug(dev);
> +	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>   	amdgpu_driver_unload_kms(dev);
>   
>   	pci_disable_device(pdev);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug
  2020-06-21  6:03 ` [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug Andrey Grodzovsky
  2020-06-22  9:55   ` Daniel Vetter
@ 2020-06-22 19:40   ` Christian König
  2020-06-23  5:11     ` Andrey Grodzovsky
  1 sibling, 1 reply; 97+ messages in thread
From: Christian König @ 2020-06-22 19:40 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen

Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> entity->rq becomes null aftre device unplugged so just return early
> in that case.

Mhm, do you have a backtrace for this?

This should only be called by an IOCTL and IOCTLs should already call 
drm_dev_enter()/exit() on their own...

Christian.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++-----
>   1 file changed, 16 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> index 8d9c6fe..d252427 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> @@ -24,6 +24,7 @@
>   #include "amdgpu_job.h"
>   #include "amdgpu_object.h"
>   #include "amdgpu_trace.h"
> +#include <drm/drm_drv.h>
>   
>   #define AMDGPU_VM_SDMA_MIN_NUM_DW	256u
>   #define AMDGPU_VM_SDMA_MAX_NUM_DW	(16u * 1024u)
> @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>   	struct drm_sched_entity *entity;
>   	struct amdgpu_ring *ring;
>   	struct dma_fence *f;
> -	int r;
> +	int r, idx;
> +
> +	if (!drm_dev_enter(p->adev->ddev, &idx)) {
> +		r = -ENODEV;
> +		goto nodev;
> +	}
>   
>   	entity = p->immediate ? &p->vm->immediate : &p->vm->delayed;
>   	ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
> @@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>   	WARN_ON(ib->length_dw > p->num_dw_left);
>   	r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f);
>   	if (r)
> -		goto error;
> +		goto job_fail;
>   
>   	if (p->unlocked) {
>   		struct dma_fence *tmp = dma_fence_get(f);
> @@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct amdgpu_vm_update_params *p,
>   	if (fence && !p->immediate)
>   		swap(*fence, f);
>   	dma_fence_put(f);
> -	return 0;
>   
> -error:
> -	amdgpu_job_free(p->job);
> +	r = 0;
> +
> +job_fail:
> +	drm_dev_exit(idx);
> +nodev:
> +	if (r)
> +		amdgpu_job_free(p->job);
> +
>   	return r;
>   }
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space
  2020-06-21  6:03 ` [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space Andrey Grodzovsky
  2020-06-22  9:45   ` Daniel Vetter
  2020-06-22 19:37   ` Christian König
@ 2020-06-22 19:47   ` Alex Deucher
  2 siblings, 0 replies; 97+ messages in thread
From: Alex Deucher @ 2020-06-22 19:47 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Daniel Vetter, Michel Dänzer, Maling list - DRI developers,
	Pekka Paalanen, amd-gfx list, Christian König

On Sun, Jun 21, 2020 at 2:05 AM Andrey Grodzovsky
<andrey.grodzovsky@amd.com> wrote:
>
> Helper function to be used to invalidate all BOs CPU mappings
> once device is removed.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Typo in the subject:
unampping -> unmapping

Alex


> ---
>  drivers/gpu/drm/ttm/ttm_bo.c    | 8 ++++++--
>  include/drm/ttm/ttm_bo_driver.h | 7 +++++++
>  2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index c5b516f..926a365 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)
>         ttm_bo_unmap_virtual_locked(bo);
>         ttm_mem_io_unlock(man);
>  }
> -
> -
>  EXPORT_SYMBOL(ttm_bo_unmap_virtual);
>
> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev)
> +{
> +       unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
> +}
> +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
> +
>  int ttm_bo_wait(struct ttm_buffer_object *bo,
>                 bool interruptible, bool no_wait)
>  {
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index c9e0fd0..39ea44f 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
>  void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
>
>  /**
> + * ttm_bo_unmap_virtual_address_space
> + *
> + * @bdev: tear down all the virtual mappings for this device
> + */
> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
> +
> +/**
>   * ttm_bo_unmap_virtual
>   *
>   * @bo: tear down the virtual mappings for this BO
> --
> 2.7.4
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove.
  2020-06-22 19:38   ` Christian König
@ 2020-06-22 19:48     ` Alex Deucher
  2020-06-23 10:22       ` Daniel Vetter
  0 siblings, 1 reply; 97+ messages in thread
From: Alex Deucher @ 2020-06-22 19:48 UTC (permalink / raw)
  To: Christian Koenig
  Cc: Andrey Grodzovsky, Daniel Vetter, Michel Dänzer,
	Maling list - DRI developers, Pekka Paalanen, amd-gfx list

On Mon, Jun 22, 2020 at 3:38 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> > Use the new TTM interface to invalidate all exsisting BO CPU mappings
> > form all user proccesses.
> >
> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> I think those two patches could already land in amd-staging-drm-next
> since they are a good idea independent of how else we fix the other issues.

Please make sure they land in drm-misc as well.

Alex

>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 43592dc..6932d75 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev)
> >       struct drm_device *dev = pci_get_drvdata(pdev);
> >
> >       drm_dev_unplug(dev);
> > +     ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
> >       amdgpu_driver_unload_kms(dev);
> >
> >       pci_disable_device(pdev);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-22 16:45         ` Greg KH
@ 2020-06-23  4:51           ` Andrey Grodzovsky
  2020-06-23  6:05             ` Greg KH
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-23  4:51 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


On 6/22/20 12:45 PM, Greg KH wrote:
> On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
>> On 6/22/20 7:21 AM, Greg KH wrote:
>>> On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
>>>> On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
>>>>> Track sysfs files in a list so they all can be removed during pci remove
>>>>> since otherwise their removal after that causes crash because parent
>>>>> folder was already removed during pci remove.
>>> Huh?  That should not happen, do you have a backtrace of that crash?
>>
>> 2 examples in the attached trace.
> Odd, how did you trigger these?


By manually triggering PCI remove from sysfs

cd /sys/bus/pci/devices/0000\:05\:00.0 && echo 1 > remove


>
>
>> [  925.738225 <    0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090
>> [  925.738232 <    0.000007>] #PF: supervisor read access in kernel mode
>> [  925.738236 <    0.000004>] #PF: error_code(0x0000) - not-present page
>> [  925.738240 <    0.000004>] PGD 0 P4D 0
>> [  925.738245 <    0.000005>] Oops: 0000 [#1] SMP PTI
>> [  925.738249 <    0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
>> [  925.738256 <    0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
>> [  925.738266 <    0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110
>> [  925.738270 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
>> [  925.738282 <    0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246
>> [  925.738287 <    0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
>> [  925.738292 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000
>> [  925.738297 <    0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000
>> [  925.738302 <    0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000
>> [  925.738307 <    0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130
>> [  925.738313 <    0.000006>] FS:  00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000
>> [  925.738319 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  925.738323 <    0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0
>> [  925.738329 <    0.000006>] Call Trace:
>> [  925.738334 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
>> [  925.738339 <    0.000005>]  sysfs_remove_group+0x25/0x80
>> [  925.738344 <    0.000005>]  sysfs_remove_groups+0x29/0x40
>> [  925.738350 <    0.000006>]  free_msi_irqs+0xf5/0x190
>> [  925.738354 <    0.000004>]  pci_disable_msi+0xe9/0x120
> So the PCI core is trying to clean up attributes that it had registered,
> which is fine.  But we can't seem to find the attributes?  Were they
> already removed somewhere else?
>
> that's odd.


Yes, as i pointed above i am emulating device remove from sysfs and this 
triggers pci device remove sequence and as part of that my specific device 
folder (05:00.0) is removed from the sysfs tree.


>
>> [  925.738406 <    0.000052>]  amdgpu_irq_fini+0xe3/0xf0 [amdgpu]
>> [  925.738453 <    0.000047>]  tonga_ih_sw_fini+0xe/0x30 [amdgpu]
>> [  925.738490 <    0.000037>]  amdgpu_device_fini_late+0x14b/0x440 [amdgpu]
>> [  925.738529 <    0.000039>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
>> [  925.738548 <    0.000019>]  drm_dev_put+0x5b/0x80 [drm]
>> [  925.738558 <    0.000010>]  drm_release+0xc6/0xd0 [drm]
>> [  925.738563 <    0.000005>]  __fput+0xc6/0x260
>> [  925.738568 <    0.000005>]  task_work_run+0x79/0xb0
>> [  925.738573 <    0.000005>]  do_exit+0x3d0/0xc60
>> [  925.738578 <    0.000005>]  do_group_exit+0x47/0xb0
>> [  925.738583 <    0.000005>]  get_signal+0x18b/0xc30
>> [  925.738589 <    0.000006>]  do_signal+0x36/0x6a0
>> [  925.738593 <    0.000004>]  ? force_sig_info_to_task+0xbc/0xd0
>> [  925.738597 <    0.000004>]  ? signal_wake_up_state+0x15/0x30
>> [  925.738603 <    0.000006>]  exit_to_usermode_loop+0x6f/0xc0
>> [  925.738608 <    0.000005>]  prepare_exit_to_usermode+0xc7/0x110
>> [  925.738613 <    0.000005>]  ret_from_intr+0x25/0x35
>> [  925.738617 <    0.000004>] RIP: 0033:0x417369
>> [  925.738621 <    0.000004>] Code: Bad RIP value.
>> [  925.738625 <    0.000004>] RSP: 002b:00007ffdd6bf0900 EFLAGS: 00010246
>> [  925.738629 <    0.000004>] RAX: 00007f3eca509000 RBX: 000000000000001e RCX: 00007f3ec95ba260
>> [  925.738634 <    0.000005>] RDX: 00007f3ec9889790 RSI: 000000000000000a RDI: 0000000000000000
>> [  925.738639 <    0.000005>] RBP: 00007ffdd6bf0990 R08: 00007f3ec9889780 R09: 00007f3eca4e8700
>> [  925.738645 <    0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000021c6170
>> [  925.738650 <    0.000005>] R13: 00007ffdd6bf0c00 R14: 0000000000000000 R15: 0000000000000000
>>
>>
>>
>>
>> [   40.880899 <    0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000090
>> [   40.880906 <    0.000007>] #PF: supervisor read access in kernel mode
>> [   40.880910 <    0.000004>] #PF: error_code(0x0000) - not-present page
>> [   40.880915 <    0.000005>] PGD 0 P4D 0
>> [   40.880920 <    0.000005>] Oops: 0000 [#1] SMP PTI
>> [   40.880924 <    0.000004>] CPU: 1 PID: 2526 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
>> [   40.880932 <    0.000008>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
>> [   40.880941 <    0.000009>] RIP: 0010:kernfs_find_ns+0x18/0x110
>> [   40.880945 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
>> [   40.880957 <    0.000012>] RSP: 0018:ffffaf3380467ba8 EFLAGS: 00010246
>> [   40.880963 <    0.000006>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
>> [   40.880968 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffc0678cfc RDI: 0000000000000000
>> [   40.880973 <    0.000005>] RBP: ffffffffc0678cfc R08: ffffffffaa379d10 R09: 0000000000000000
>> [   40.880979 <    0.000006>] R10: ffffaf3380467be0 R11: ffff93547615d128 R12: 0000000000000000
>> [   40.880984 <    0.000005>] R13: 0000000000000000 R14: ffffffffc0678cfc R15: ffff93549be86130
>> [   40.880990 <    0.000006>] FS:  00007fd9ecb10700(0000) GS:ffff9354bd840000(0000) knlGS:0000000000000000
>> [   40.880996 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   40.881001 <    0.000005>] CR2: 0000000000000090 CR3: 0000000072866001 CR4: 00000000000606e0
>> [   40.881006 <    0.000005>] Call Trace:
>> [   40.881011 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
>> [   40.881016 <    0.000005>]  sysfs_remove_group+0x25/0x80
>> [   40.881055 <    0.000039>]  amdgpu_device_fini_late+0x3eb/0x440 [amdgpu]
>> [   40.881095 <    0.000040>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
> Here is this is your driver doing the same thing, removing attributes it
> created.  But again they are not there.
>
> So something went through and wiped the tree clean, which if I'm reading
> this correctly, your patch would not solve as you would try to also
> remove attributes that were already removed, right?


I don't think so, the stack here is from a later stage (after pci remove) where 
the last user process holding a reference to the device file decides to die and 
thus triggering drm_dev_release sequence after drm dev refcount dropped to zero. 
And this why my patch helps, i am expediting all amdgpu sysfs attributes removal 
to the pci remove stage when the device folder is still present in the sysfs 
hierarchy. At least this is my  understanding to why it helped. I admit i am not 
an expert on sysfs internals.


>
> And 5.5-rc7 is a bit old (6 months and many thousands of changes ago),
> does this still happen on a modern, released, kernel?


I will give it a try with the latest and greatest but it might take some time as 
I have to make a temporary context switch to some urgent task.

Andrey


>
> thanks,
>
> greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space
  2020-06-22  9:45   ` Daniel Vetter
@ 2020-06-23  5:00     ` Andrey Grodzovsky
  2020-06-23 10:25       ` Daniel Vetter
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-23  5:00 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


On 6/22/20 5:45 AM, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 02:03:03AM -0400, Andrey Grodzovsky wrote:
>> Helper function to be used to invalidate all BOs CPU mappings
>> once device is removed.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> This seems to be missing the code to invalidate all the dma-buf mmaps?
>
> Probably needs more testcases if you're not yet catching this. Or am I
> missing something, and we're exchanging the the address space also for
> dma-buf?
> -Daniel


IMHO the device address space includes all user clients having a CPU view of the 
BO either from direct mapping though drm file or by  mapping through imported 
BO's FD.

Andrey


>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c    | 8 ++++++--
>>   include/drm/ttm/ttm_bo_driver.h | 7 +++++++
>>   2 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index c5b516f..926a365 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)
>>   	ttm_bo_unmap_virtual_locked(bo);
>>   	ttm_mem_io_unlock(man);
>>   }
>> -
>> -
>>   EXPORT_SYMBOL(ttm_bo_unmap_virtual);
>>   
>> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev)
>> +{
>> +	unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
>> +}
>> +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
>> +
>>   int ttm_bo_wait(struct ttm_buffer_object *bo,
>>   		bool interruptible, bool no_wait)
>>   {
>> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
>> index c9e0fd0..39ea44f 100644
>> --- a/include/drm/ttm/ttm_bo_driver.h
>> +++ b/include/drm/ttm/ttm_bo_driver.h
>> @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
>>   void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
>>   
>>   /**
>> + * ttm_bo_unmap_virtual_address_space
>> + *
>> + * @bdev: tear down all the virtual mappings for this device
>> + */
>> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
>> +
>> +/**
>>    * ttm_bo_unmap_virtual
>>    *
>>    * @bo: tear down the virtual mappings for this BO
>> -- 
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug
  2020-06-22 19:40   ` Christian König
@ 2020-06-23  5:11     ` Andrey Grodzovsky
  2020-06-23  7:14       ` Christian König
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-23  5:11 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen


On 6/22/20 3:40 PM, Christian König wrote:
> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>> entity->rq becomes null aftre device unplugged so just return early
>> in that case.
>
> Mhm, do you have a backtrace for this?
>
> This should only be called by an IOCTL and IOCTLs should already call 
> drm_dev_enter()/exit() on their own...
>
> Christian.


See bellow, it's not during IOCTL but during all GEM objects release when 
releasing the device. entity->rq becomes null because all the gpu schedulers are 
marked as not ready during the early pci remove stage and so the next time sdma 
job tries to pick a scheduler to run nothing is available and it's set to null.

Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382648] BUG: kernel NULL pointer 
dereference, address: 0000000000000038
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382651] #PF: supervisor read 
access in kernel mode
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382652] #PF: error_code(0x0000) 
- not-present page
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382653] PGD 0 P4D 0
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382656] Oops: 0000 [#1] SMP PTI
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382658] CPU: 6 PID: 2598 Comm: 
llvmpipe-6 Tainted: G           OE     5.6.0-dev+ #51
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382659] Hardware name: System 
manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382700] RIP: 
0010:amdgpu_vm_sdma_commit+0x6c/0x270 [amdgpu]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382702] Code: 01 00 00 48 89 ee 
48 c7 c7 ef d4 85 c0 e8 fc 5f e8 ff 48 8b 75 10 48 c7 c7 fd d4 85 c0 e8 ec 5f e8 
ff 48 8b 45 10 41 8b 55 08 <48> 8b 40 38 85 d2 48 8d b8 30 ff ff ff 0f 84 9b 01 
00 00 48 8b 80
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382704] RSP: 
0018:ffffa88e40f57950 EFLAGS: 00010282
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382705] RAX: 0000000000000000 
RBX: ffffa88e40f579a8 RCX: 0000000000000001
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382707] RDX: 0000000000000014 
RSI: ffff94d4d62388e0 RDI: ffff94d4dbd98e30
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382708] RBP: ffff94d4d2ad3288 
R08: 0000000000000000 R09: 0000000000000001
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382709] R10: 000000000000001f 
R11: 0000000000000000 R12: ffffa88e40f57a48
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382710] R13: ffff94d4d627a5e8 
R14: ffff94d4d424d978 R15: 0000000800100020
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382712] FS: 
00007f30ae694700(0000) GS:ffff94d4dbd80000(0000) knlGS:0000000000000000
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382713] CS:  0010 DS: 0000 ES: 
0000 CR0: 0000000080050033
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382714] CR2: 0000000000000038 
CR3: 0000000121810006 CR4: 00000000000606e0
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382716] Call Trace:
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382755] 
amdgpu_vm_bo_update_mapping.constprop.30+0x16b/0x230 [amdgpu]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382795] 
amdgpu_vm_clear_freed+0xd7/0x210 [amdgpu]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382833] 
amdgpu_gem_object_close+0x200/0x2b0 [amdgpu]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382856]  ? 
drm_gem_object_handle_put_unlocked+0x90/0x90 [drm]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382864]  ? 
drm_gem_object_release_handle+0x2c/0x90 [drm]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382872] 
drm_gem_object_release_handle+0x2c/0x90 [drm]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382879]  ? 
drm_gem_object_handle_put_unlocked+0x90/0x90 [drm]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382882] idr_for_each+0x48/0xd0
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382885]  ? 
_raw_spin_unlock_irqrestore+0x2d/0x50
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382893] 
drm_gem_release+0x1c/0x30 [drm]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382901] 
drm_file_free+0x21d/0x270 [drm]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382908] drm_release+0x67/0xe0 [drm]
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382912] __fput+0xc6/0x260
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382916] task_work_run+0x79/0xb0
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382919] do_exit+0x3d0/0xc40
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382921]  ? get_signal+0x13d/0xc30
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382924] do_group_exit+0x47/0xb0
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382926] get_signal+0x18b/0xc30
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382929] do_signal+0x36/0x6a0
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382931]  ? 
__set_task_comm+0x62/0x120
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382935]  ? 
__x64_sys_futex+0x88/0x180
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382938] 
exit_to_usermode_loop+0x6f/0xc0
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382941] do_syscall_64+0x149/0x1c0
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382943] 
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382944] RIP: 0033:0x7f30f7f35360
Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382947] Code: Bad RIP value.


Andrey


>
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 ++++++++++++++++-----
>>   1 file changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>> index 8d9c6fe..d252427 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>> @@ -24,6 +24,7 @@
>>   #include "amdgpu_job.h"
>>   #include "amdgpu_object.h"
>>   #include "amdgpu_trace.h"
>> +#include <drm/drm_drv.h>
>>     #define AMDGPU_VM_SDMA_MIN_NUM_DW    256u
>>   #define AMDGPU_VM_SDMA_MAX_NUM_DW    (16u * 1024u)
>> @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct 
>> amdgpu_vm_update_params *p,
>>       struct drm_sched_entity *entity;
>>       struct amdgpu_ring *ring;
>>       struct dma_fence *f;
>> -    int r;
>> +    int r, idx;
>> +
>> +    if (!drm_dev_enter(p->adev->ddev, &idx)) {
>> +        r = -ENODEV;
>> +        goto nodev;
>> +    }
>>         entity = p->immediate ? &p->vm->immediate : &p->vm->delayed;
>>       ring = container_of(entity->rq->sched, struct amdgpu_ring, sched);
>> @@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct 
>> amdgpu_vm_update_params *p,
>>       WARN_ON(ib->length_dw > p->num_dw_left);
>>       r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f);
>>       if (r)
>> -        goto error;
>> +        goto job_fail;
>>         if (p->unlocked) {
>>           struct dma_fence *tmp = dma_fence_get(f);
>> @@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct 
>> amdgpu_vm_update_params *p,
>>       if (fence && !p->immediate)
>>           swap(*fence, f);
>>       dma_fence_put(f);
>> -    return 0;
>>   -error:
>> -    amdgpu_job_free(p->job);
>> +    r = 0;
>> +
>> +job_fail:
>> +    drm_dev_exit(idx);
>> +nodev:
>> +    if (r)
>> +        amdgpu_job_free(p->job);
>> +
>>       return r;
>>   }
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 0/8] RFC Support hot device unplug in amdgpu
  2020-06-22  9:46 ` [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Daniel Vetter
@ 2020-06-23  5:14   ` Andrey Grodzovsky
  2020-06-23  9:04     ` Michel Dänzer
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-23  5:14 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher

I am fighting with Thunderbird to make limit a line to 80 chars but nothing 
helps. Any suggestions please.

Andrey

On 6/22/20 5:46 AM, Daniel Vetter wrote:
> Also a nit: Please tell your mailer to break long lines, it looks funny
> and inconsistent otherwise, at least in some of the mailers I use here :-/
> -Daniel
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-23  4:51           ` Andrey Grodzovsky
@ 2020-06-23  6:05             ` Greg KH
  2020-06-24  3:04               ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Greg KH @ 2020-06-23  6:05 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
> 
> On 6/22/20 12:45 PM, Greg KH wrote:
> > On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
> > > On 6/22/20 7:21 AM, Greg KH wrote:
> > > > On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
> > > > > On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
> > > > > > Track sysfs files in a list so they all can be removed during pci remove
> > > > > > since otherwise their removal after that causes crash because parent
> > > > > > folder was already removed during pci remove.
> > > > Huh?  That should not happen, do you have a backtrace of that crash?
> > > 
> > > 2 examples in the attached trace.
> > Odd, how did you trigger these?
> 
> 
> By manually triggering PCI remove from sysfs
> 
> cd /sys/bus/pci/devices/0000\:05\:00.0 && echo 1 > remove

For some reason, I didn't think that video/drm devices could handle
hot-remove like this.  The "old" PCI hotplug specification explicitly
said that video devices were not supported, has that changed?

And this whole issue is probably tied to the larger issue that Daniel
was asking me about, when it came to device lifetimes and the drm layer,
so odds are we need to fix that up first before worrying about trying to
support this crazy request, right?  :)

> > > [  925.738225 <    0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090
> > > [  925.738232 <    0.000007>] #PF: supervisor read access in kernel mode
> > > [  925.738236 <    0.000004>] #PF: error_code(0x0000) - not-present page
> > > [  925.738240 <    0.000004>] PGD 0 P4D 0
> > > [  925.738245 <    0.000005>] Oops: 0000 [#1] SMP PTI
> > > [  925.738249 <    0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
> > > [  925.738256 <    0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
> > > [  925.738266 <    0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110
> > > [  925.738270 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
> > > [  925.738282 <    0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246
> > > [  925.738287 <    0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
> > > [  925.738292 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000
> > > [  925.738297 <    0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000
> > > [  925.738302 <    0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000
> > > [  925.738307 <    0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130
> > > [  925.738313 <    0.000006>] FS:  00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000
> > > [  925.738319 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  925.738323 <    0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0
> > > [  925.738329 <    0.000006>] Call Trace:
> > > [  925.738334 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
> > > [  925.738339 <    0.000005>]  sysfs_remove_group+0x25/0x80
> > > [  925.738344 <    0.000005>]  sysfs_remove_groups+0x29/0x40
> > > [  925.738350 <    0.000006>]  free_msi_irqs+0xf5/0x190
> > > [  925.738354 <    0.000004>]  pci_disable_msi+0xe9/0x120
> > So the PCI core is trying to clean up attributes that it had registered,
> > which is fine.  But we can't seem to find the attributes?  Were they
> > already removed somewhere else?
> > 
> > that's odd.
> 
> 
> Yes, as i pointed above i am emulating device remove from sysfs and this
> triggers pci device remove sequence and as part of that my specific device
> folder (05:00.0) is removed from the sysfs tree.

But why are things being removed twice?

> > > [  925.738406 <    0.000052>]  amdgpu_irq_fini+0xe3/0xf0 [amdgpu]
> > > [  925.738453 <    0.000047>]  tonga_ih_sw_fini+0xe/0x30 [amdgpu]
> > > [  925.738490 <    0.000037>]  amdgpu_device_fini_late+0x14b/0x440 [amdgpu]
> > > [  925.738529 <    0.000039>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
> > > [  925.738548 <    0.000019>]  drm_dev_put+0x5b/0x80 [drm]
> > > [  925.738558 <    0.000010>]  drm_release+0xc6/0xd0 [drm]
> > > [  925.738563 <    0.000005>]  __fput+0xc6/0x260
> > > [  925.738568 <    0.000005>]  task_work_run+0x79/0xb0
> > > [  925.738573 <    0.000005>]  do_exit+0x3d0/0xc60
> > > [  925.738578 <    0.000005>]  do_group_exit+0x47/0xb0
> > > [  925.738583 <    0.000005>]  get_signal+0x18b/0xc30
> > > [  925.738589 <    0.000006>]  do_signal+0x36/0x6a0
> > > [  925.738593 <    0.000004>]  ? force_sig_info_to_task+0xbc/0xd0
> > > [  925.738597 <    0.000004>]  ? signal_wake_up_state+0x15/0x30
> > > [  925.738603 <    0.000006>]  exit_to_usermode_loop+0x6f/0xc0
> > > [  925.738608 <    0.000005>]  prepare_exit_to_usermode+0xc7/0x110
> > > [  925.738613 <    0.000005>]  ret_from_intr+0x25/0x35
> > > [  925.738617 <    0.000004>] RIP: 0033:0x417369
> > > [  925.738621 <    0.000004>] Code: Bad RIP value.
> > > [  925.738625 <    0.000004>] RSP: 002b:00007ffdd6bf0900 EFLAGS: 00010246
> > > [  925.738629 <    0.000004>] RAX: 00007f3eca509000 RBX: 000000000000001e RCX: 00007f3ec95ba260
> > > [  925.738634 <    0.000005>] RDX: 00007f3ec9889790 RSI: 000000000000000a RDI: 0000000000000000
> > > [  925.738639 <    0.000005>] RBP: 00007ffdd6bf0990 R08: 00007f3ec9889780 R09: 00007f3eca4e8700
> > > [  925.738645 <    0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000021c6170
> > > [  925.738650 <    0.000005>] R13: 00007ffdd6bf0c00 R14: 0000000000000000 R15: 0000000000000000
> > > 
> > > 
> > > 
> > > 
> > > [   40.880899 <    0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000090
> > > [   40.880906 <    0.000007>] #PF: supervisor read access in kernel mode
> > > [   40.880910 <    0.000004>] #PF: error_code(0x0000) - not-present page
> > > [   40.880915 <    0.000005>] PGD 0 P4D 0
> > > [   40.880920 <    0.000005>] Oops: 0000 [#1] SMP PTI
> > > [   40.880924 <    0.000004>] CPU: 1 PID: 2526 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
> > > [   40.880932 <    0.000008>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
> > > [   40.880941 <    0.000009>] RIP: 0010:kernfs_find_ns+0x18/0x110
> > > [   40.880945 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
> > > [   40.880957 <    0.000012>] RSP: 0018:ffffaf3380467ba8 EFLAGS: 00010246
> > > [   40.880963 <    0.000006>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
> > > [   40.880968 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffc0678cfc RDI: 0000000000000000
> > > [   40.880973 <    0.000005>] RBP: ffffffffc0678cfc R08: ffffffffaa379d10 R09: 0000000000000000
> > > [   40.880979 <    0.000006>] R10: ffffaf3380467be0 R11: ffff93547615d128 R12: 0000000000000000
> > > [   40.880984 <    0.000005>] R13: 0000000000000000 R14: ffffffffc0678cfc R15: ffff93549be86130
> > > [   40.880990 <    0.000006>] FS:  00007fd9ecb10700(0000) GS:ffff9354bd840000(0000) knlGS:0000000000000000
> > > [   40.880996 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   40.881001 <    0.000005>] CR2: 0000000000000090 CR3: 0000000072866001 CR4: 00000000000606e0
> > > [   40.881006 <    0.000005>] Call Trace:
> > > [   40.881011 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
> > > [   40.881016 <    0.000005>]  sysfs_remove_group+0x25/0x80
> > > [   40.881055 <    0.000039>]  amdgpu_device_fini_late+0x3eb/0x440 [amdgpu]
> > > [   40.881095 <    0.000040>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
> > Here is this is your driver doing the same thing, removing attributes it
> > created.  But again they are not there.
> > 
> > So something went through and wiped the tree clean, which if I'm reading
> > this correctly, your patch would not solve as you would try to also
> > remove attributes that were already removed, right?
> 
> 
> I don't think so, the stack here is from a later stage (after pci remove)
> where the last user process holding a reference to the device file decides
> to die and thus triggering drm_dev_release sequence after drm dev refcount
> dropped to zero. And this why my patch helps, i am expediting all amdgpu
> sysfs attributes removal to the pci remove stage when the device folder is
> still present in the sysfs hierarchy. At least this is my  understanding to
> why it helped. I admit i am not an expert on sysfs internals.

Ok, yeah, I think this is back to the drm lifecycle issues mentioned
above.

{sigh}, I'll get to that once I deal with the -rc1/-rc2 merge fallout,
that will take me a week or so, sorry...

thanks,

greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug
  2020-06-23  5:11     ` Andrey Grodzovsky
@ 2020-06-23  7:14       ` Christian König
  0 siblings, 0 replies; 97+ messages in thread
From: Christian König @ 2020-06-23  7:14 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel
  Cc: alexdeucher, daniel.vetter, michel, ppaalanen

Am 23.06.20 um 07:11 schrieb Andrey Grodzovsky:
>
> On 6/22/20 3:40 PM, Christian König wrote:
>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>> entity->rq becomes null aftre device unplugged so just return early
>>> in that case.
>>
>> Mhm, do you have a backtrace for this?
>>
>> This should only be called by an IOCTL and IOCTLs should already call 
>> drm_dev_enter()/exit() on their own...
>>
>> Christian.
>
>
> See bellow, it's not during IOCTL but during all GEM objects release 
> when releasing the device. entity->rq becomes null because all the gpu 
> schedulers are marked as not ready during the early pci remove stage 
> and so the next time sdma job tries to pick a scheduler to run nothing 
> is available and it's set to null.

I see. This should then probably go into amdgpu_gem_object_close() 
before we reserve the PD.

See drm_dev_enter()/exit() are kind of a read side lock and with this we 
create a nice lock inversion when we do it in the low level SDMA VM backend.

Christian.

>
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382648] BUG: kernel 
> NULL pointer dereference, address: 0000000000000038
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382651] #PF: 
> supervisor read access in kernel mode
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382652] #PF: 
> error_code(0x0000) - not-present page
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382653] PGD 0 P4D 0
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382656] Oops: 0000 
> [#1] SMP PTI
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382658] CPU: 6 PID: 
> 2598 Comm: llvmpipe-6 Tainted: G           OE     5.6.0-dev+ #51
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382659] Hardware name: 
> System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 
> 12/30/2013
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382700] RIP: 
> 0010:amdgpu_vm_sdma_commit+0x6c/0x270 [amdgpu]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382702] Code: 01 00 00 
> 48 89 ee 48 c7 c7 ef d4 85 c0 e8 fc 5f e8 ff 48 8b 75 10 48 c7 c7 fd 
> d4 85 c0 e8 ec 5f e8 ff 48 8b 45 10 41 8b 55 08 <48> 8b 40 38 85 d2 48 
> 8d b8 30 ff ff ff 0f 84 9b 01 00 00 48 8b 80
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382704] RSP: 
> 0018:ffffa88e40f57950 EFLAGS: 00010282
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382705] RAX: 
> 0000000000000000 RBX: ffffa88e40f579a8 RCX: 0000000000000001
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382707] RDX: 
> 0000000000000014 RSI: ffff94d4d62388e0 RDI: ffff94d4dbd98e30
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382708] RBP: 
> ffff94d4d2ad3288 R08: 0000000000000000 R09: 0000000000000001
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382709] R10: 
> 000000000000001f R11: 0000000000000000 R12: ffffa88e40f57a48
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382710] R13: 
> ffff94d4d627a5e8 R14: ffff94d4d424d978 R15: 0000000800100020
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382712] FS: 
> 00007f30ae694700(0000) GS:ffff94d4dbd80000(0000) knlGS:0000000000000000
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382713] CS:  0010 DS: 
> 0000 ES: 0000 CR0: 0000000080050033
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382714] CR2: 
> 0000000000000038 CR3: 0000000121810006 CR4: 00000000000606e0
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382716] Call Trace:
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382755] 
> amdgpu_vm_bo_update_mapping.constprop.30+0x16b/0x230 [amdgpu]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382795] 
> amdgpu_vm_clear_freed+0xd7/0x210 [amdgpu]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382833] 
> amdgpu_gem_object_close+0x200/0x2b0 [amdgpu]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382856]  ? 
> drm_gem_object_handle_put_unlocked+0x90/0x90 [drm]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382864]  ? 
> drm_gem_object_release_handle+0x2c/0x90 [drm]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382872] 
> drm_gem_object_release_handle+0x2c/0x90 [drm]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382879]  ? 
> drm_gem_object_handle_put_unlocked+0x90/0x90 [drm]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382882] 
> idr_for_each+0x48/0xd0
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382885]  ? 
> _raw_spin_unlock_irqrestore+0x2d/0x50
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382893] 
> drm_gem_release+0x1c/0x30 [drm]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382901] 
> drm_file_free+0x21d/0x270 [drm]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382908] 
> drm_release+0x67/0xe0 [drm]
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382912] __fput+0xc6/0x260
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382916] 
> task_work_run+0x79/0xb0
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382919] 
> do_exit+0x3d0/0xc40
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382921]  ? 
> get_signal+0x13d/0xc30
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382924] 
> do_group_exit+0x47/0xb0
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382926] 
> get_signal+0x18b/0xc30
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382929] 
> do_signal+0x36/0x6a0
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382931]  ? 
> __set_task_comm+0x62/0x120
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382935]  ? 
> __x64_sys_futex+0x88/0x180
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382938] 
> exit_to_usermode_loop+0x6f/0xc0
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382941] 
> do_syscall_64+0x149/0x1c0
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382943] 
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382944] RIP: 
> 0033:0x7f30f7f35360
> Jun  8 11:14:56 ubuntu-1604-test kernel: [   44.382947] Code: Bad RIP 
> value.
>
>
> Andrey
>
>
>>
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 21 
>>> ++++++++++++++++-----
>>>   1 file changed, 16 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>>> index 8d9c6fe..d252427 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
>>> @@ -24,6 +24,7 @@
>>>   #include "amdgpu_job.h"
>>>   #include "amdgpu_object.h"
>>>   #include "amdgpu_trace.h"
>>> +#include <drm/drm_drv.h>
>>>     #define AMDGPU_VM_SDMA_MIN_NUM_DW    256u
>>>   #define AMDGPU_VM_SDMA_MAX_NUM_DW    (16u * 1024u)
>>> @@ -94,7 +95,12 @@ static int amdgpu_vm_sdma_commit(struct 
>>> amdgpu_vm_update_params *p,
>>>       struct drm_sched_entity *entity;
>>>       struct amdgpu_ring *ring;
>>>       struct dma_fence *f;
>>> -    int r;
>>> +    int r, idx;
>>> +
>>> +    if (!drm_dev_enter(p->adev->ddev, &idx)) {
>>> +        r = -ENODEV;
>>> +        goto nodev;
>>> +    }
>>>         entity = p->immediate ? &p->vm->immediate : &p->vm->delayed;
>>>       ring = container_of(entity->rq->sched, struct amdgpu_ring, 
>>> sched);
>>> @@ -104,7 +110,7 @@ static int amdgpu_vm_sdma_commit(struct 
>>> amdgpu_vm_update_params *p,
>>>       WARN_ON(ib->length_dw > p->num_dw_left);
>>>       r = amdgpu_job_submit(p->job, entity, AMDGPU_FENCE_OWNER_VM, &f);
>>>       if (r)
>>> -        goto error;
>>> +        goto job_fail;
>>>         if (p->unlocked) {
>>>           struct dma_fence *tmp = dma_fence_get(f);
>>> @@ -118,10 +124,15 @@ static int amdgpu_vm_sdma_commit(struct 
>>> amdgpu_vm_update_params *p,
>>>       if (fence && !p->immediate)
>>>           swap(*fence, f);
>>>       dma_fence_put(f);
>>> -    return 0;
>>>   -error:
>>> -    amdgpu_job_free(p->job);
>>> +    r = 0;
>>> +
>>> +job_fail:
>>> +    drm_dev_exit(idx);
>>> +nodev:
>>> +    if (r)
>>> +        amdgpu_job_free(p->job);
>>> +
>>>       return r;
>>>   }
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 0/8] RFC Support hot device unplug in amdgpu
  2020-06-23  5:14   ` Andrey Grodzovsky
@ 2020-06-23  9:04     ` Michel Dänzer
  2020-06-24  3:21       ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Michel Dänzer @ 2020-06-23  9:04 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: daniel.vetter, amd-gfx, dri-devel, ckoenig.leichtzumerken

On 2020-06-23 7:14 a.m., Andrey Grodzovsky wrote:
> I am fighting with Thunderbird to make limit a line to 80 chars but
> nothing helps. Any suggestions please.

Maybe try disabling mail.compose.default_to_paragraph, or check other
*wrap* settings.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove.
  2020-06-22 19:48     ` Alex Deucher
@ 2020-06-23 10:22       ` Daniel Vetter
  2020-06-23 13:16         ` Christian König
  0 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-06-23 10:22 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Andrey Grodzovsky, Daniel Vetter, Michel Dänzer,
	Maling list - DRI developers, Pekka Paalanen, amd-gfx list,
	Christian Koenig

On Mon, Jun 22, 2020 at 03:48:29PM -0400, Alex Deucher wrote:
> On Mon, Jun 22, 2020 at 3:38 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> > > Use the new TTM interface to invalidate all exsisting BO CPU mappings
> > > form all user proccesses.
> > >
> > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >
> > Reviewed-by: Christian König <christian.koenig@amd.com>
> >
> > I think those two patches could already land in amd-staging-drm-next
> > since they are a good idea independent of how else we fix the other issues.
> 
> Please make sure they land in drm-misc as well.

Not sure that's much use, since without any of the fault side changes you
just blow up on the first refault. Seems somewhat silly to charge ahead on
this with the other bits still very much under discussion.

Plus I suggested a possible bikeshed here :-)
-Daniel

> 
> Alex
> 
> >
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index 43592dc..6932d75 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev)
> > >       struct drm_device *dev = pci_get_drvdata(pdev);
> > >
> > >       drm_dev_unplug(dev);
> > > +     ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
> > >       amdgpu_driver_unload_kms(dev);
> > >
> > >       pci_disable_device(pdev);
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space
  2020-06-23  5:00     ` Andrey Grodzovsky
@ 2020-06-23 10:25       ` Daniel Vetter
  2020-06-23 12:55         ` Christian König
  0 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-06-23 10:25 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Tue, Jun 23, 2020 at 01:00:02AM -0400, Andrey Grodzovsky wrote:
> 
> On 6/22/20 5:45 AM, Daniel Vetter wrote:
> > On Sun, Jun 21, 2020 at 02:03:03AM -0400, Andrey Grodzovsky wrote:
> > > Helper function to be used to invalidate all BOs CPU mappings
> > > once device is removed.
> > > 
> > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > This seems to be missing the code to invalidate all the dma-buf mmaps?
> > 
> > Probably needs more testcases if you're not yet catching this. Or am I
> > missing something, and we're exchanging the the address space also for
> > dma-buf?
> > -Daniel
> 
> 
> IMHO the device address space includes all user clients having a CPU view of
> the BO either from direct mapping though drm file or by  mapping through
> imported BO's FD.

Uh this is all very confusing and very much midlayer-y thanks to ttm.

I think a much better solution would be to have a core gem helper for
this (well not even gem really, this is core drm), which directly uses
drm_device->anon_inode->i_mapping.

Then
a) it clearly matches what drm_prime.c does on export
b) can be reused across all drivers, not just ttm

So much better.

What's more, we could then very easily make the generic
drm_dev_unplug_and_unmap helper I've talked about for the amdgpu patch,
which I think would be really neat&pretty.

Thoughts?
-Daniel

> 
> Andrey
> 
> 
> > 
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_bo.c    | 8 ++++++--
> > >   include/drm/ttm/ttm_bo_driver.h | 7 +++++++
> > >   2 files changed, 13 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > > index c5b516f..926a365 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)
> > >   	ttm_bo_unmap_virtual_locked(bo);
> > >   	ttm_mem_io_unlock(man);
> > >   }
> > > -
> > > -
> > >   EXPORT_SYMBOL(ttm_bo_unmap_virtual);
> > > +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev)
> > > +{
> > > +	unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
> > > +}
> > > +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
> > > +
> > >   int ttm_bo_wait(struct ttm_buffer_object *bo,
> > >   		bool interruptible, bool no_wait)
> > >   {
> > > diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> > > index c9e0fd0..39ea44f 100644
> > > --- a/include/drm/ttm/ttm_bo_driver.h
> > > +++ b/include/drm/ttm/ttm_bo_driver.h
> > > @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
> > >   void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
> > >   /**
> > > + * ttm_bo_unmap_virtual_address_space
> > > + *
> > > + * @bdev: tear down all the virtual mappings for this device
> > > + */
> > > +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
> > > +
> > > +/**
> > >    * ttm_bo_unmap_virtual
> > >    *
> > >    * @bo: tear down the virtual mappings for this BO
> > > -- 
> > > 2.7.4
> > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space
  2020-06-23 10:25       ` Daniel Vetter
@ 2020-06-23 12:55         ` Christian König
  0 siblings, 0 replies; 97+ messages in thread
From: Christian König @ 2020-06-23 12:55 UTC (permalink / raw)
  To: Daniel Vetter, Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx, alexdeucher

Am 23.06.20 um 12:25 schrieb Daniel Vetter:
> On Tue, Jun 23, 2020 at 01:00:02AM -0400, Andrey Grodzovsky wrote:
>> On 6/22/20 5:45 AM, Daniel Vetter wrote:
>>> On Sun, Jun 21, 2020 at 02:03:03AM -0400, Andrey Grodzovsky wrote:
>>>> Helper function to be used to invalidate all BOs CPU mappings
>>>> once device is removed.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> This seems to be missing the code to invalidate all the dma-buf mmaps?
>>>
>>> Probably needs more testcases if you're not yet catching this. Or am I
>>> missing something, and we're exchanging the the address space also for
>>> dma-buf?
>>> -Daniel
>>
>> IMHO the device address space includes all user clients having a CPU view of
>> the BO either from direct mapping though drm file or by  mapping through
>> imported BO's FD.
> Uh this is all very confusing and very much midlayer-y thanks to ttm.
>
> I think a much better solution would be to have a core gem helper for
> this (well not even gem really, this is core drm), which directly uses
> drm_device->anon_inode->i_mapping.
>
> Then
> a) it clearly matches what drm_prime.c does on export
> b) can be reused across all drivers, not just ttm
>
> So much better.
>
> What's more, we could then very easily make the generic
> drm_dev_unplug_and_unmap helper I've talked about for the amdgpu patch,
> which I think would be really neat&pretty.

Good point, that is indeed a rather nice idea.

Christian.

>
> Thoughts?
> -Daniel
>
>> Andrey
>>
>>
>>>> ---
>>>>    drivers/gpu/drm/ttm/ttm_bo.c    | 8 ++++++--
>>>>    include/drm/ttm/ttm_bo_driver.h | 7 +++++++
>>>>    2 files changed, 13 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> index c5b516f..926a365 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>> @@ -1750,10 +1750,14 @@ void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo)
>>>>    	ttm_bo_unmap_virtual_locked(bo);
>>>>    	ttm_mem_io_unlock(man);
>>>>    }
>>>> -
>>>> -
>>>>    EXPORT_SYMBOL(ttm_bo_unmap_virtual);
>>>> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev)
>>>> +{
>>>> +	unmap_mapping_range(bdev->dev_mapping, 0, 0, 1);
>>>> +}
>>>> +EXPORT_SYMBOL(ttm_bo_unmap_virtual_address_space);
>>>> +
>>>>    int ttm_bo_wait(struct ttm_buffer_object *bo,
>>>>    		bool interruptible, bool no_wait)
>>>>    {
>>>> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
>>>> index c9e0fd0..39ea44f 100644
>>>> --- a/include/drm/ttm/ttm_bo_driver.h
>>>> +++ b/include/drm/ttm/ttm_bo_driver.h
>>>> @@ -601,6 +601,13 @@ int ttm_bo_device_init(struct ttm_bo_device *bdev,
>>>>    void ttm_bo_unmap_virtual(struct ttm_buffer_object *bo);
>>>>    /**
>>>> + * ttm_bo_unmap_virtual_address_space
>>>> + *
>>>> + * @bdev: tear down all the virtual mappings for this device
>>>> + */
>>>> +void ttm_bo_unmap_virtual_address_space(struct ttm_bo_device *bdev);
>>>> +
>>>> +/**
>>>>     * ttm_bo_unmap_virtual
>>>>     *
>>>>     * @bo: tear down the virtual mappings for this BO
>>>> -- 
>>>> 2.7.4
>>>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove.
  2020-06-23 10:22       ` Daniel Vetter
@ 2020-06-23 13:16         ` Christian König
  2020-06-24  3:12           ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Christian König @ 2020-06-23 13:16 UTC (permalink / raw)
  To: Daniel Vetter, Alex Deucher
  Cc: Andrey Grodzovsky, Daniel Vetter, Michel Dänzer,
	Maling list - DRI developers, Pekka Paalanen, amd-gfx list

Am 23.06.20 um 12:22 schrieb Daniel Vetter:
> On Mon, Jun 22, 2020 at 03:48:29PM -0400, Alex Deucher wrote:
>> On Mon, Jun 22, 2020 at 3:38 PM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>> Use the new TTM interface to invalidate all exsisting BO CPU mappings
>>>> form all user proccesses.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>
>>> I think those two patches could already land in amd-staging-drm-next
>>> since they are a good idea independent of how else we fix the other issues.
>> Please make sure they land in drm-misc as well.
> Not sure that's much use, since without any of the fault side changes you
> just blow up on the first refault. Seems somewhat silly to charge ahead on
> this with the other bits still very much under discussion.

Well what I wanted to say is that we don't need to send out those simple 
patches once more.

> Plus I suggested a possible bikeshed here :-)

No bikeshed, but indeed a rather good idea to not make this a TTM function.

Christian.

> -Daniel
>
>> Alex
>>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> index 43592dc..6932d75 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>>>>        struct drm_device *dev = pci_get_drvdata(pdev);
>>>>
>>>>        drm_dev_unplug(dev);
>>>> +     ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>>>>        amdgpu_driver_unload_kms(dev);
>>>>
>>>>        pci_disable_device(pdev);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-23  6:05             ` Greg KH
@ 2020-06-24  3:04               ` Andrey Grodzovsky
  2020-06-24  6:11                 ` Greg KH
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-24  3:04 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


On 6/23/20 2:05 AM, Greg KH wrote:
> On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
>> On 6/22/20 12:45 PM, Greg KH wrote:
>>> On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
>>>> On 6/22/20 7:21 AM, Greg KH wrote:
>>>>> On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
>>>>>> On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
>>>>>>> Track sysfs files in a list so they all can be removed during pci remove
>>>>>>> since otherwise their removal after that causes crash because parent
>>>>>>> folder was already removed during pci remove.
>>>>> Huh?  That should not happen, do you have a backtrace of that crash?
>>>> 2 examples in the attached trace.
>>> Odd, how did you trigger these?
>>
>> By manually triggering PCI remove from sysfs
>>
>> cd /sys/bus/pci/devices/0000\:05\:00.0 && echo 1 > remove
> For some reason, I didn't think that video/drm devices could handle
> hot-remove like this.  The "old" PCI hotplug specification explicitly
> said that video devices were not supported, has that changed?
>
> And this whole issue is probably tied to the larger issue that Daniel
> was asking me about, when it came to device lifetimes and the drm layer,
> so odds are we need to fix that up first before worrying about trying to
> support this crazy request, right?  :)
>
>>>> [  925.738225 <    0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090
>>>> [  925.738232 <    0.000007>] #PF: supervisor read access in kernel mode
>>>> [  925.738236 <    0.000004>] #PF: error_code(0x0000) - not-present page
>>>> [  925.738240 <    0.000004>] PGD 0 P4D 0
>>>> [  925.738245 <    0.000005>] Oops: 0000 [#1] SMP PTI
>>>> [  925.738249 <    0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
>>>> [  925.738256 <    0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
>>>> [  925.738266 <    0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110
>>>> [  925.738270 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
>>>> [  925.738282 <    0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246
>>>> [  925.738287 <    0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
>>>> [  925.738292 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000
>>>> [  925.738297 <    0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000
>>>> [  925.738302 <    0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000
>>>> [  925.738307 <    0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130
>>>> [  925.738313 <    0.000006>] FS:  00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000
>>>> [  925.738319 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [  925.738323 <    0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0
>>>> [  925.738329 <    0.000006>] Call Trace:
>>>> [  925.738334 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
>>>> [  925.738339 <    0.000005>]  sysfs_remove_group+0x25/0x80
>>>> [  925.738344 <    0.000005>]  sysfs_remove_groups+0x29/0x40
>>>> [  925.738350 <    0.000006>]  free_msi_irqs+0xf5/0x190
>>>> [  925.738354 <    0.000004>]  pci_disable_msi+0xe9/0x120
>>> So the PCI core is trying to clean up attributes that it had registered,
>>> which is fine.  But we can't seem to find the attributes?  Were they
>>> already removed somewhere else?
>>>
>>> that's odd.
>>
>> Yes, as i pointed above i am emulating device remove from sysfs and this
>> triggers pci device remove sequence and as part of that my specific device
>> folder (05:00.0) is removed from the sysfs tree.
> But why are things being removed twice?


Not sure I understand what removed twice ? I remove only once per sysfs attribute.

Andrey


>
>>>> [  925.738406 <    0.000052>]  amdgpu_irq_fini+0xe3/0xf0 [amdgpu]
>>>> [  925.738453 <    0.000047>]  tonga_ih_sw_fini+0xe/0x30 [amdgpu]
>>>> [  925.738490 <    0.000037>]  amdgpu_device_fini_late+0x14b/0x440 [amdgpu]
>>>> [  925.738529 <    0.000039>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
>>>> [  925.738548 <    0.000019>]  drm_dev_put+0x5b/0x80 [drm]
>>>> [  925.738558 <    0.000010>]  drm_release+0xc6/0xd0 [drm]
>>>> [  925.738563 <    0.000005>]  __fput+0xc6/0x260
>>>> [  925.738568 <    0.000005>]  task_work_run+0x79/0xb0
>>>> [  925.738573 <    0.000005>]  do_exit+0x3d0/0xc60
>>>> [  925.738578 <    0.000005>]  do_group_exit+0x47/0xb0
>>>> [  925.738583 <    0.000005>]  get_signal+0x18b/0xc30
>>>> [  925.738589 <    0.000006>]  do_signal+0x36/0x6a0
>>>> [  925.738593 <    0.000004>]  ? force_sig_info_to_task+0xbc/0xd0
>>>> [  925.738597 <    0.000004>]  ? signal_wake_up_state+0x15/0x30
>>>> [  925.738603 <    0.000006>]  exit_to_usermode_loop+0x6f/0xc0
>>>> [  925.738608 <    0.000005>]  prepare_exit_to_usermode+0xc7/0x110
>>>> [  925.738613 <    0.000005>]  ret_from_intr+0x25/0x35
>>>> [  925.738617 <    0.000004>] RIP: 0033:0x417369
>>>> [  925.738621 <    0.000004>] Code: Bad RIP value.
>>>> [  925.738625 <    0.000004>] RSP: 002b:00007ffdd6bf0900 EFLAGS: 00010246
>>>> [  925.738629 <    0.000004>] RAX: 00007f3eca509000 RBX: 000000000000001e RCX: 00007f3ec95ba260
>>>> [  925.738634 <    0.000005>] RDX: 00007f3ec9889790 RSI: 000000000000000a RDI: 0000000000000000
>>>> [  925.738639 <    0.000005>] RBP: 00007ffdd6bf0990 R08: 00007f3ec9889780 R09: 00007f3eca4e8700
>>>> [  925.738645 <    0.000006>] R10: 000000000000035c R11: 0000000000000246 R12: 00000000021c6170
>>>> [  925.738650 <    0.000005>] R13: 00007ffdd6bf0c00 R14: 0000000000000000 R15: 0000000000000000
>>>>
>>>>
>>>>
>>>>
>>>> [   40.880899 <    0.000004>] BUG: kernel NULL pointer dereference, address: 0000000000000090
>>>> [   40.880906 <    0.000007>] #PF: supervisor read access in kernel mode
>>>> [   40.880910 <    0.000004>] #PF: error_code(0x0000) - not-present page
>>>> [   40.880915 <    0.000005>] PGD 0 P4D 0
>>>> [   40.880920 <    0.000005>] Oops: 0000 [#1] SMP PTI
>>>> [   40.880924 <    0.000004>] CPU: 1 PID: 2526 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
>>>> [   40.880932 <    0.000008>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
>>>> [   40.880941 <    0.000009>] RIP: 0010:kernfs_find_ns+0x18/0x110
>>>> [   40.880945 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
>>>> [   40.880957 <    0.000012>] RSP: 0018:ffffaf3380467ba8 EFLAGS: 00010246
>>>> [   40.880963 <    0.000006>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
>>>> [   40.880968 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffc0678cfc RDI: 0000000000000000
>>>> [   40.880973 <    0.000005>] RBP: ffffffffc0678cfc R08: ffffffffaa379d10 R09: 0000000000000000
>>>> [   40.880979 <    0.000006>] R10: ffffaf3380467be0 R11: ffff93547615d128 R12: 0000000000000000
>>>> [   40.880984 <    0.000005>] R13: 0000000000000000 R14: ffffffffc0678cfc R15: ffff93549be86130
>>>> [   40.880990 <    0.000006>] FS:  00007fd9ecb10700(0000) GS:ffff9354bd840000(0000) knlGS:0000000000000000
>>>> [   40.880996 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [   40.881001 <    0.000005>] CR2: 0000000000000090 CR3: 0000000072866001 CR4: 00000000000606e0
>>>> [   40.881006 <    0.000005>] Call Trace:
>>>> [   40.881011 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
>>>> [   40.881016 <    0.000005>]  sysfs_remove_group+0x25/0x80
>>>> [   40.881055 <    0.000039>]  amdgpu_device_fini_late+0x3eb/0x440 [amdgpu]
>>>> [   40.881095 <    0.000040>]  amdgpu_driver_release_kms+0x16/0x40 [amdgpu]
>>> Here is this is your driver doing the same thing, removing attributes it
>>> created.  But again they are not there.
>>>
>>> So something went through and wiped the tree clean, which if I'm reading
>>> this correctly, your patch would not solve as you would try to also
>>> remove attributes that were already removed, right?
>>
>> I don't think so, the stack here is from a later stage (after pci remove)
>> where the last user process holding a reference to the device file decides
>> to die and thus triggering drm_dev_release sequence after drm dev refcount
>> dropped to zero. And this why my patch helps, i am expediting all amdgpu
>> sysfs attributes removal to the pci remove stage when the device folder is
>> still present in the sysfs hierarchy. At least this is my  understanding to
>> why it helped. I admit i am not an expert on sysfs internals.
> Ok, yeah, I think this is back to the drm lifecycle issues mentioned
> above.
>
> {sigh}, I'll get to that once I deal with the -rc1/-rc2 merge fallout,
> that will take me a week or so, sorry...
>
> thanks,
>
> greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove.
  2020-06-23 13:16         ` Christian König
@ 2020-06-24  3:12           ` Andrey Grodzovsky
  0 siblings, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-24  3:12 UTC (permalink / raw)
  To: Christian König, Daniel Vetter, Alex Deucher
  Cc: Daniel Vetter, Michel Dänzer, Pekka Paalanen,
	Maling list - DRI developers, amd-gfx list


On 6/23/20 9:16 AM, Christian König wrote:
> Am 23.06.20 um 12:22 schrieb Daniel Vetter:
>> On Mon, Jun 22, 2020 at 03:48:29PM -0400, Alex Deucher wrote:
>>> On Mon, Jun 22, 2020 at 3:38 PM Christian König
>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>> Use the new TTM interface to invalidate all exsisting BO CPU mappings
>>>>> form all user proccesses.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> Reviewed-by: Christian König <christian.koenig@amd.com>
>>>>
>>>> I think those two patches could already land in amd-staging-drm-next
>>>> since they are a good idea independent of how else we fix the other issues.
>>> Please make sure they land in drm-misc as well.
>> Not sure that's much use, since without any of the fault side changes you
>> just blow up on the first refault. Seems somewhat silly to charge ahead on
>> this with the other bits still very much under discussion.
>
> Well what I wanted to say is that we don't need to send out those simple 
> patches once more.
>
>> Plus I suggested a possible bikeshed here :-)
>
> No bikeshed, but indeed a rather good idea to not make this a TTM function.
>
> Christian.


So i will incorporate the changes suggested to turn the TTM part into generic 
DRM helper and will resend both patches as part of V3 (which might take a while 
now due to a context switch I am doing for another task).

Andrey


>
>> -Daniel
>>
>>> Alex
>>>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
>>>>>    1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> index 43592dc..6932d75 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>> @@ -1135,6 +1135,7 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>>>>>        struct drm_device *dev = pci_get_drvdata(pdev);
>>>>>
>>>>>        drm_dev_unplug(dev);
>>>>> + ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>>>>>        amdgpu_driver_unload_kms(dev);
>>>>>
>>>>>        pci_disable_device(pdev);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 0/8] RFC Support hot device unplug in amdgpu
  2020-06-23  9:04     ` Michel Dänzer
@ 2020-06-24  3:21       ` Andrey Grodzovsky
  0 siblings, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-24  3:21 UTC (permalink / raw)
  To: Michel Dänzer, Daniel Vetter
  Cc: daniel.vetter, amd-gfx, dri-devel, ckoenig.leichtzumerken

Tried, didn't have any impact

Andrey


On 6/23/20 5:04 AM, Michel Dänzer wrote:
> On 2020-06-23 7:14 a.m., Andrey Grodzovsky wrote:
>> I am fighting with Thunderbird to make limit a line to 80 chars but
>> nothing helps. Any suggestions please.
> Maybe try disabling mail.compose.default_to_paragraph, or check other
> *wrap* settings.
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page.
  2020-06-22  9:41   ` Daniel Vetter
@ 2020-06-24  3:31     ` Andrey Grodzovsky
  2020-06-24  7:19       ` Daniel Vetter
  2020-11-10 17:41     ` Andrey Grodzovsky
  1 sibling, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-24  3:31 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


On 6/22/20 5:41 AM, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 02:03:02AM -0400, Andrey Grodzovsky wrote:
>> On device removal reroute all CPU mappings to dummy page per drm_file
>> instance or imported GEM object.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 57 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> index 389128b..2f8bf5e 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> @@ -35,6 +35,8 @@
>>   #include <drm/ttm/ttm_bo_driver.h>
>>   #include <drm/ttm/ttm_placement.h>
>>   #include <drm/drm_vma_manager.h>
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_file.h>
>>   #include <linux/mm.h>
>>   #include <linux/pfn_t.h>
>>   #include <linux/rbtree.h>
>> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
> Hm I think diff and code flow look a bit bad now. What about renaming the
> current function to __ttm_bo_vm_fault and then having something like the
> below:
>
> ttm_bo_vm_fault(args) {
>
> 	if (drm_dev_enter()) {
> 		__ttm_bo_vm_fault(args);
> 		drm_dev_exit();
> 	} else  {
> 		drm_gem_insert_dummy_pfn();
> 	}
> }
>
> I think drm_gem_insert_dummy_pfn(); should be portable across drivers, so
> another nice point to try to unifiy drivers as much as possible.
> -Daniel
>
>>   	pgprot_t prot;
>>   	struct ttm_buffer_object *bo = vma->vm_private_data;
>>   	vm_fault_t ret;
>> +	int idx;
>> +	struct drm_device *ddev = bo->base.dev;
>>   
>> -	ret = ttm_bo_vm_reserve(bo, vmf);
>> -	if (ret)
>> -		return ret;
>> +	if (drm_dev_enter(ddev, &idx)) {
>> +		ret = ttm_bo_vm_reserve(bo, vmf);
>> +		if (ret)
>> +			goto exit;
>> +
>> +		prot = vma->vm_page_prot;
>>   
>> -	prot = vma->vm_page_prot;
>> -	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
>> -	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
>> +		ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
>> +		if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
>> +			goto exit;
>> +
>> +		dma_resv_unlock(bo->base.resv);
>> +
>> +exit:
>> +		drm_dev_exit(idx);
>>   		return ret;
>> +	} else {
>>   
>> -	dma_resv_unlock(bo->base.resv);
>> +		struct drm_file *file = NULL;
>> +		struct page *dummy_page = NULL;
>> +		int handle;
>>   
>> -	return ret;
>> +		/* We are faulting on imported BO from dma_buf */
>> +		if (bo->base.dma_buf && bo->base.import_attach) {
>> +			dummy_page = bo->base.dummy_page;
>> +		/* We are faulting on non imported BO, find drm_file owning the BO*/
> Uh, we can't fish that out of the vma->vm_file pointer somehow? Or is that
> one all wrong? Doing this kind of list walk looks pretty horrible.
>
> If the vma doesn't have the right pointer I guess next option is that we
> store the drm_file page in gem_bo->dummy_page, and replace it on first
> export. But that's going to be tricky to track ...
>
>> +		} else {
>> +			struct drm_gem_object *gobj;
>> +
>> +			mutex_lock(&ddev->filelist_mutex);
>> +			list_for_each_entry(file, &ddev->filelist, lhead) {
>> +				spin_lock(&file->table_lock);
>> +				idr_for_each_entry(&file->object_idr, gobj, handle) {
>> +					if (gobj == &bo->base) {
>> +						dummy_page = file->dummy_page;
>> +						break;
>> +					}
>> +				}
>> +				spin_unlock(&file->table_lock);
>> +			}
>> +			mutex_unlock(&ddev->filelist_mutex);
>> +		}
>> +
>> +		if (dummy_page) {
>> +			/*
>> +			 * Let do_fault complete the PTE install e.t.c using vmf->page
>> +			 *
>> +			 * TODO - should i call free_page somewhere ?
> Nah, instead don't call get_page. The page will be around as long as
> there's a reference for the drm_file or gem_bo, which is longer than any
> mmap. Otherwise yes this would like really badly.


So actually that was my thinking in the first place and I indeed avoided taking 
reference and this ended up
with multiple BUG_ONs as seen bellow where  refcount:-63 mapcount:-48 for a page 
are deep into negative
values... Those warnings were gone once i added get_page(dummy) which in my 
opinion implies that there
is a page reference per each PTE and that when there is unmapping of the process 
address
space and PTEs are deleted there is also put_page somewhere in mm core and the 
get_page per mapping
keeps it balanced.

Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762929] BUG: Bad page map in 
process glxgear:disk$0  pte:8000000132284867 pmd:15aaec067
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762931] page:ffffe63384c8a100 
refcount:-63 mapcount:-48 mapping:0000000000000000 index:0x0
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762932] flags: 
0x17fff8000000008(dirty)
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762933] raw: 017fff8000000008 
dead000000000100 dead000000000122 0000000000000000
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762934] raw: 0000000000000000 
0000000000000000 ffffffc1ffffffcf 0000000000000000
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762935] page dumped because: bad pte
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762937] addr:00007fe086263000 
vm_flags:1c0440fb anon_vma:0000000000000000 mapping:ffff9b5cd42db268 index:1008b3
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762981] file:renderD129 
fault:ttm_bo_vm_fault [ttm] mmap:amdgpu_mmap [amdgpu] readpage:0x0
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762984] CPU: 5 PID: 2619 Comm: 
glxgear:disk$0 Tainted: G    B      OE 5.6.0-dev+ #51
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762985] Hardware name: System 
manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762985] Call Trace:
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762988] dump_stack+0x68/0x9b
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762990] print_bad_pte+0x19f/0x270
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762992]  ? lock_page_memcg+0x5/0xf0
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762995] unmap_page_range+0x777/0xbe0
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763000] unmap_vmas+0xcc/0x160
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763004] exit_mmap+0xb5/0x1b0
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763009] mmput+0x65/0x140
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763010] do_exit+0x362/0xc40
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763013] do_group_exit+0x47/0xb0
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763016] get_signal+0x18b/0xc30
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763019] do_signal+0x36/0x6a0
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763021]  ? 
__set_task_comm+0x62/0x120
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763024]  ? 
__x64_sys_futex+0x88/0x180
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763028] 
exit_to_usermode_loop+0x6f/0xc0
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763030] do_syscall_64+0x149/0x1c0
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763032] 
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763034] RIP: 0033:0x7fe091bd9360
Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763037] Code: Bad RIP value.

Andrey


>
>> +			 */
>> +			get_page(dummy_page);
>> +			vmf->page = dummy_page;
>> +			return 0;
>> +		} else {
>> +			return VM_FAULT_SIGSEGV;
> Hm that would be a kernel bug, wouldn't it? WARN_ON() required here imo.
> -Daniel
>
>> +		}
>> +	}
>>   }
>>   EXPORT_SYMBOL(ttm_bo_vm_fault);
>>   
>> -- 
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-24  3:04               ` Andrey Grodzovsky
@ 2020-06-24  6:11                 ` Greg KH
  2020-06-25  1:52                   ` Andrey Grodzovsky
  2020-11-10 17:54                   ` Andrey Grodzovsky
  0 siblings, 2 replies; 97+ messages in thread
From: Greg KH @ 2020-06-24  6:11 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Tue, Jun 23, 2020 at 11:04:30PM -0400, Andrey Grodzovsky wrote:
> 
> On 6/23/20 2:05 AM, Greg KH wrote:
> > On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
> > > On 6/22/20 12:45 PM, Greg KH wrote:
> > > > On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
> > > > > On 6/22/20 7:21 AM, Greg KH wrote:
> > > > > > On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
> > > > > > > On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
> > > > > > > > Track sysfs files in a list so they all can be removed during pci remove
> > > > > > > > since otherwise their removal after that causes crash because parent
> > > > > > > > folder was already removed during pci remove.
> > > > > > Huh?  That should not happen, do you have a backtrace of that crash?
> > > > > 2 examples in the attached trace.
> > > > Odd, how did you trigger these?
> > > 
> > > By manually triggering PCI remove from sysfs
> > > 
> > > cd /sys/bus/pci/devices/0000\:05\:00.0 && echo 1 > remove
> > For some reason, I didn't think that video/drm devices could handle
> > hot-remove like this.  The "old" PCI hotplug specification explicitly
> > said that video devices were not supported, has that changed?
> > 
> > And this whole issue is probably tied to the larger issue that Daniel
> > was asking me about, when it came to device lifetimes and the drm layer,
> > so odds are we need to fix that up first before worrying about trying to
> > support this crazy request, right?  :)
> > 
> > > > > [  925.738225 <    0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090
> > > > > [  925.738232 <    0.000007>] #PF: supervisor read access in kernel mode
> > > > > [  925.738236 <    0.000004>] #PF: error_code(0x0000) - not-present page
> > > > > [  925.738240 <    0.000004>] PGD 0 P4D 0
> > > > > [  925.738245 <    0.000005>] Oops: 0000 [#1] SMP PTI
> > > > > [  925.738249 <    0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
> > > > > [  925.738256 <    0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
> > > > > [  925.738266 <    0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110
> > > > > [  925.738270 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
> > > > > [  925.738282 <    0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246
> > > > > [  925.738287 <    0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
> > > > > [  925.738292 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000
> > > > > [  925.738297 <    0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000
> > > > > [  925.738302 <    0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000
> > > > > [  925.738307 <    0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130
> > > > > [  925.738313 <    0.000006>] FS:  00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000
> > > > > [  925.738319 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [  925.738323 <    0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0
> > > > > [  925.738329 <    0.000006>] Call Trace:
> > > > > [  925.738334 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
> > > > > [  925.738339 <    0.000005>]  sysfs_remove_group+0x25/0x80
> > > > > [  925.738344 <    0.000005>]  sysfs_remove_groups+0x29/0x40
> > > > > [  925.738350 <    0.000006>]  free_msi_irqs+0xf5/0x190
> > > > > [  925.738354 <    0.000004>]  pci_disable_msi+0xe9/0x120
> > > > So the PCI core is trying to clean up attributes that it had registered,
> > > > which is fine.  But we can't seem to find the attributes?  Were they
> > > > already removed somewhere else?
> > > > 
> > > > that's odd.
> > > 
> > > Yes, as i pointed above i am emulating device remove from sysfs and this
> > > triggers pci device remove sequence and as part of that my specific device
> > > folder (05:00.0) is removed from the sysfs tree.
> > But why are things being removed twice?
> 
> 
> Not sure I understand what removed twice ? I remove only once per sysfs attribute.

This code path shows that the kernel is trying to remove a file that is
not present, so someone removed it already...

thanks,

gre k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page.
  2020-06-24  3:31     ` Andrey Grodzovsky
@ 2020-06-24  7:19       ` Daniel Vetter
  0 siblings, 0 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-06-24  7:19 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Tue, Jun 23, 2020 at 11:31:45PM -0400, Andrey Grodzovsky wrote:
> 
> On 6/22/20 5:41 AM, Daniel Vetter wrote:
> > On Sun, Jun 21, 2020 at 02:03:02AM -0400, Andrey Grodzovsky wrote:
> > > On device removal reroute all CPU mappings to dummy page per drm_file
> > > instance or imported GEM object.
> > > 
> > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > ---
> > >   drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++-----
> > >   1 file changed, 57 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > index 389128b..2f8bf5e 100644
> > > --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> > > @@ -35,6 +35,8 @@
> > >   #include <drm/ttm/ttm_bo_driver.h>
> > >   #include <drm/ttm/ttm_placement.h>
> > >   #include <drm/drm_vma_manager.h>
> > > +#include <drm/drm_drv.h>
> > > +#include <drm/drm_file.h>
> > >   #include <linux/mm.h>
> > >   #include <linux/pfn_t.h>
> > >   #include <linux/rbtree.h>
> > > @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
> > Hm I think diff and code flow look a bit bad now. What about renaming the
> > current function to __ttm_bo_vm_fault and then having something like the
> > below:
> > 
> > ttm_bo_vm_fault(args) {
> > 
> > 	if (drm_dev_enter()) {
> > 		__ttm_bo_vm_fault(args);
> > 		drm_dev_exit();
> > 	} else  {
> > 		drm_gem_insert_dummy_pfn();
> > 	}
> > }
> > 
> > I think drm_gem_insert_dummy_pfn(); should be portable across drivers, so
> > another nice point to try to unifiy drivers as much as possible.
> > -Daniel
> > 
> > >   	pgprot_t prot;
> > >   	struct ttm_buffer_object *bo = vma->vm_private_data;
> > >   	vm_fault_t ret;
> > > +	int idx;
> > > +	struct drm_device *ddev = bo->base.dev;
> > > -	ret = ttm_bo_vm_reserve(bo, vmf);
> > > -	if (ret)
> > > -		return ret;
> > > +	if (drm_dev_enter(ddev, &idx)) {
> > > +		ret = ttm_bo_vm_reserve(bo, vmf);
> > > +		if (ret)
> > > +			goto exit;
> > > +
> > > +		prot = vma->vm_page_prot;
> > > -	prot = vma->vm_page_prot;
> > > -	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
> > > -	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> > > +		ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
> > > +		if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
> > > +			goto exit;
> > > +
> > > +		dma_resv_unlock(bo->base.resv);
> > > +
> > > +exit:
> > > +		drm_dev_exit(idx);
> > >   		return ret;
> > > +	} else {
> > > -	dma_resv_unlock(bo->base.resv);
> > > +		struct drm_file *file = NULL;
> > > +		struct page *dummy_page = NULL;
> > > +		int handle;
> > > -	return ret;
> > > +		/* We are faulting on imported BO from dma_buf */
> > > +		if (bo->base.dma_buf && bo->base.import_attach) {
> > > +			dummy_page = bo->base.dummy_page;
> > > +		/* We are faulting on non imported BO, find drm_file owning the BO*/
> > Uh, we can't fish that out of the vma->vm_file pointer somehow? Or is that
> > one all wrong? Doing this kind of list walk looks pretty horrible.
> > 
> > If the vma doesn't have the right pointer I guess next option is that we
> > store the drm_file page in gem_bo->dummy_page, and replace it on first
> > export. But that's going to be tricky to track ...
> > 
> > > +		} else {
> > > +			struct drm_gem_object *gobj;
> > > +
> > > +			mutex_lock(&ddev->filelist_mutex);
> > > +			list_for_each_entry(file, &ddev->filelist, lhead) {
> > > +				spin_lock(&file->table_lock);
> > > +				idr_for_each_entry(&file->object_idr, gobj, handle) {
> > > +					if (gobj == &bo->base) {
> > > +						dummy_page = file->dummy_page;
> > > +						break;
> > > +					}
> > > +				}
> > > +				spin_unlock(&file->table_lock);
> > > +			}
> > > +			mutex_unlock(&ddev->filelist_mutex);
> > > +		}
> > > +
> > > +		if (dummy_page) {
> > > +			/*
> > > +			 * Let do_fault complete the PTE install e.t.c using vmf->page
> > > +			 *
> > > +			 * TODO - should i call free_page somewhere ?
> > Nah, instead don't call get_page. The page will be around as long as
> > there's a reference for the drm_file or gem_bo, which is longer than any
> > mmap. Otherwise yes this would like really badly.
> 
> 
> So actually that was my thinking in the first place and I indeed avoided
> taking reference and this ended up
> with multiple BUG_ONs as seen bellow where  refcount:-63 mapcount:-48 for a
> page are deep into negative
> values... Those warnings were gone once i added get_page(dummy) which in my
> opinion implies that there
> is a page reference per each PTE and that when there is unmapping of the
> process address
> space and PTEs are deleted there is also put_page somewhere in mm core and
> the get_page per mapping
> keeps it balanced.
> 
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762929] BUG: Bad page map in
> process glxgear:disk$0  pte:8000000132284867 pmd:15aaec067
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762931]
> page:ffffe63384c8a100 refcount:-63 mapcount:-48 mapping:0000000000000000
> index:0x0
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762932] flags:
> 0x17fff8000000008(dirty)
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762933] raw:
> 017fff8000000008 dead000000000100 dead000000000122 0000000000000000
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762934] raw:
> 0000000000000000 0000000000000000 ffffffc1ffffffcf 0000000000000000
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762935] page dumped because: bad pte
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762937]
> addr:00007fe086263000 vm_flags:1c0440fb anon_vma:0000000000000000
> mapping:ffff9b5cd42db268 index:1008b3
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762981] file:renderD129
> fault:ttm_bo_vm_fault [ttm] mmap:amdgpu_mmap [amdgpu] readpage:0x0
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762984] CPU: 5 PID: 2619
> Comm: glxgear:disk$0 Tainted: G    B      OE 5.6.0-dev+ #51
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762985] Hardware name:
> System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804
> 12/30/2013
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762985] Call Trace:
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762988] dump_stack+0x68/0x9b
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762990] print_bad_pte+0x19f/0x270
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762992]  ? lock_page_memcg+0x5/0xf0
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.762995] unmap_page_range+0x777/0xbe0
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763000] unmap_vmas+0xcc/0x160
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763004] exit_mmap+0xb5/0x1b0
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763009] mmput+0x65/0x140
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763010] do_exit+0x362/0xc40
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763013] do_group_exit+0x47/0xb0
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763016] get_signal+0x18b/0xc30
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763019] do_signal+0x36/0x6a0
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763021]  ?
> __set_task_comm+0x62/0x120
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763024]  ?
> __x64_sys_futex+0x88/0x180
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763028]
> exit_to_usermode_loop+0x6f/0xc0
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763030] do_syscall_64+0x149/0x1c0
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763032]
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763034] RIP: 0033:0x7fe091bd9360
> Jun 20 01:36:43 ubuntu-1604-test kernel: [   98.763037] Code: Bad RIP value.

Uh, I guess that just shows how little I understand how this all works.
But yeah if we set vmf->page then I guess core mm takes care of
everything, but apparently expects a page reference.
-Daniel
 
> Andrey
> 
> 
> > 
> > > +			 */
> > > +			get_page(dummy_page);
> > > +			vmf->page = dummy_page;
> > > +			return 0;
> > > +		} else {
> > > +			return VM_FAULT_SIGSEGV;
> > Hm that would be a kernel bug, wouldn't it? WARN_ON() required here imo.
> > -Daniel
> > 
> > > +		}
> > > +	}
> > >   }
> > >   EXPORT_SYMBOL(ttm_bo_vm_fault);
> > > -- 
> > > 2.7.4
> > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-24  6:11                 ` Greg KH
@ 2020-06-25  1:52                   ` Andrey Grodzovsky
  2020-11-10 17:54                   ` Andrey Grodzovsky
  1 sibling, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-06-25  1:52 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


On 6/24/20 2:11 AM, Greg KH wrote:
> On Tue, Jun 23, 2020 at 11:04:30PM -0400, Andrey Grodzovsky wrote:
>> On 6/23/20 2:05 AM, Greg KH wrote:
>>> On Tue, Jun 23, 2020 at 12:51:00AM -0400, Andrey Grodzovsky wrote:
>>>> On 6/22/20 12:45 PM, Greg KH wrote:
>>>>> On Mon, Jun 22, 2020 at 12:07:25PM -0400, Andrey Grodzovsky wrote:
>>>>>> On 6/22/20 7:21 AM, Greg KH wrote:
>>>>>>> On Mon, Jun 22, 2020 at 11:51:24AM +0200, Daniel Vetter wrote:
>>>>>>>> On Sun, Jun 21, 2020 at 02:03:05AM -0400, Andrey Grodzovsky wrote:
>>>>>>>>> Track sysfs files in a list so they all can be removed during pci remove
>>>>>>>>> since otherwise their removal after that causes crash because parent
>>>>>>>>> folder was already removed during pci remove.
>>>>>>> Huh?  That should not happen, do you have a backtrace of that crash?
>>>>>> 2 examples in the attached trace.
>>>>> Odd, how did you trigger these?
>>>> By manually triggering PCI remove from sysfs
>>>>
>>>> cd /sys/bus/pci/devices/0000\:05\:00.0 && echo 1 > remove
>>> For some reason, I didn't think that video/drm devices could handle
>>> hot-remove like this.  The "old" PCI hotplug specification explicitly
>>> said that video devices were not supported, has that changed?
>>>
>>> And this whole issue is probably tied to the larger issue that Daniel
>>> was asking me about, when it came to device lifetimes and the drm layer,
>>> so odds are we need to fix that up first before worrying about trying to
>>> support this crazy request, right?  :)
>>>
>>>>>> [  925.738225 <    0.188086>] BUG: kernel NULL pointer dereference, address: 0000000000000090
>>>>>> [  925.738232 <    0.000007>] #PF: supervisor read access in kernel mode
>>>>>> [  925.738236 <    0.000004>] #PF: error_code(0x0000) - not-present page
>>>>>> [  925.738240 <    0.000004>] PGD 0 P4D 0
>>>>>> [  925.738245 <    0.000005>] Oops: 0000 [#1] SMP PTI
>>>>>> [  925.738249 <    0.000004>] CPU: 7 PID: 2547 Comm: amdgpu_test Tainted: G        W  OE     5.5.0-rc7-dev-kfd+ #50
>>>>>> [  925.738256 <    0.000007>] Hardware name: System manufacturer System Product Name/RAMPAGE IV FORMULA, BIOS 4804 12/30/2013
>>>>>> [  925.738266 <    0.000010>] RIP: 0010:kernfs_find_ns+0x18/0x110
>>>>>> [  925.738270 <    0.000004>] Code: a6 cf ff 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 41 57 41 56 49 89 f6 41 55 41 54 49 89 fd 55 53 49 89 d4 <0f> b7 af 90 00 00 00 8b 05 8f ee 6b 01 48 8b 5f 68 66 83 e5 20 41
>>>>>> [  925.738282 <    0.000012>] RSP: 0018:ffffad6d0118fb00 EFLAGS: 00010246
>>>>>> [  925.738287 <    0.000005>] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 2098a12076864b7e
>>>>>> [  925.738292 <    0.000005>] RDX: 0000000000000000 RSI: ffffffffb6606b31 RDI: 0000000000000000
>>>>>> [  925.738297 <    0.000005>] RBP: ffffffffb6606b31 R08: ffffffffb5379d10 R09: 0000000000000000
>>>>>> [  925.738302 <    0.000005>] R10: ffffad6d0118fb38 R11: ffff9a75f64820a8 R12: 0000000000000000
>>>>>> [  925.738307 <    0.000005>] R13: 0000000000000000 R14: ffffffffb6606b31 R15: ffff9a7612b06130
>>>>>> [  925.738313 <    0.000006>] FS:  00007f3eca4e8700(0000) GS:ffff9a763dbc0000(0000) knlGS:0000000000000000
>>>>>> [  925.738319 <    0.000006>] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [  925.738323 <    0.000004>] CR2: 0000000000000090 CR3: 0000000035e5a005 CR4: 00000000000606e0
>>>>>> [  925.738329 <    0.000006>] Call Trace:
>>>>>> [  925.738334 <    0.000005>]  kernfs_find_and_get_ns+0x2e/0x50
>>>>>> [  925.738339 <    0.000005>]  sysfs_remove_group+0x25/0x80
>>>>>> [  925.738344 <    0.000005>]  sysfs_remove_groups+0x29/0x40
>>>>>> [  925.738350 <    0.000006>]  free_msi_irqs+0xf5/0x190
>>>>>> [  925.738354 <    0.000004>]  pci_disable_msi+0xe9/0x120
>>>>> So the PCI core is trying to clean up attributes that it had registered,
>>>>> which is fine.  But we can't seem to find the attributes?  Were they
>>>>> already removed somewhere else?
>>>>>
>>>>> that's odd.
>>>> Yes, as i pointed above i am emulating device remove from sysfs and this
>>>> triggers pci device remove sequence and as part of that my specific device
>>>> folder (05:00.0) is removed from the sysfs tree.
>>> But why are things being removed twice?
>>
>> Not sure I understand what removed twice ? I remove only once per sysfs attribute.
> This code path shows that the kernel is trying to remove a file that is
> not present, so someone removed it already...
>
> thanks,
>
> gre k-h


That a mystery for me too...

Andrey


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22  9:35   ` Daniel Vetter
  2020-06-22 14:21     ` Pekka Paalanen
@ 2020-11-09 20:34     ` Andrey Grodzovsky
  2020-11-15  6:39     ` Andrey Grodzovsky
  2 siblings, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-09 20:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


On 6/22/20 5:35 AM, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
>> Will be used to reroute CPU mapped BO's page faults once
>> device is removed.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>   include/drm/drm_file.h      |  2 ++
>>   include/drm/drm_gem.h       |  2 ++
>>   4 files changed, 22 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index c4c704e..67c0770 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>   			goto out_prime_destroy;
>>   	}
>>   
>> +	file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +	if (!file->dummy_page) {
>> +		ret = -ENOMEM;
>> +		goto out_prime_destroy;
>> +	}
>> +
>>   	return file;
>>   
>>   out_prime_destroy:
>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>   	if (dev->driver->postclose)
>>   		dev->driver->postclose(dev, file);
>>   
>> +	__free_page(file->dummy_page);
>> +
>>   	drm_prime_destroy_file_private(&file->prime);
>>   
>>   	WARN_ON(!list_empty(&file->event_list));
>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>> index 1de2cde..c482e9c 100644
>> --- a/drivers/gpu/drm/drm_prime.c
>> +++ b/drivers/gpu/drm/drm_prime.c
>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>   
>>   	ret = drm_prime_add_buf_handle(&file_priv->prime,
>>   			dma_buf, *handle);
>> +
>> +	if (!ret) {
>> +		obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +		if (!obj->dummy_page)
>> +			ret = -ENOMEM;
>> +	}
>> +
>>   	mutex_unlock(&file_priv->prime.lock);
>>   	if (ret)
>>   		goto fail;
>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
>>   		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>   	dma_buf = attach->dmabuf;
>>   	dma_buf_detach(attach->dmabuf, attach);
>> +
>> +	__free_page(obj->dummy_page);
>> +
>>   	/* remove the reference */
>>   	dma_buf_put(dma_buf);
>>   }
>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>> index 19df802..349a658 100644
>> --- a/include/drm/drm_file.h
>> +++ b/include/drm/drm_file.h
>> @@ -335,6 +335,8 @@ struct drm_file {
>>   	 */
>>   	struct drm_prime_file_private prime;
>>   
> Kerneldoc for these please, including why we need them and when. E.g. the
> one in gem_bo should say it's only for exported buffers, so that we're not
> colliding security spaces.
>
>> +	struct page *dummy_page;
>> +
>>   	/* private: */
>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>   	unsigned long lock_count; /* DRI1 legacy lock count */
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index 0b37506..47460d1 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -310,6 +310,8 @@ struct drm_gem_object {
>>   	 *
>>   	 */
>>   	const struct drm_gem_object_funcs *funcs;
>> +
>> +	struct page *dummy_page;
>>   };
> I think amdgpu doesn't care, but everyone else still might care somewhat
> about flink. That also shares buffers, so also needs to allocate the
> per-bo dummy page.


Hi, back to this topic after a long context switch for internal project.

I don't see why for FLINK we can't use same dummy page from struct 
drm_gem_object - looking
at drm_gem_flink_ioctl I see that the underlying object we look up is still of 
type drm_gem_object.
Why we need per BO (TTM BO  I assume?) dummy page for this ?

Andrey


>
> I also wonder whether we shouldn't have a helper to look up the dummy
> page, just to encode in core code how it's supposedo to cascade.
> -Daniel
>
>>   
>>   /**
>> -- 
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22 17:50         ` Daniel Vetter
@ 2020-11-09 20:53           ` Andrey Grodzovsky
  2020-11-13 20:52           ` Andrey Grodzovsky
  1 sibling, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-09 20:53 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Alex Deucher, Michel Dänzer, Pekka Paalanen, dri-devel,
	amd-gfx list


On 6/22/20 1:50 PM, Daniel Vetter wrote:
> On Mon, Jun 22, 2020 at 7:45 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>> device is removed.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>    include/drm/drm_file.h      |  2 ++
>>>>>    include/drm/drm_gem.h       |  2 ++
>>>>>    4 files changed, 22 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>> index c4c704e..67c0770 100644
>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>> drm_minor *minor)
>>>>>                goto out_prime_destroy;
>>>>>        }
>>>>>    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>> +    if (!file->dummy_page) {
>>>>> +        ret = -ENOMEM;
>>>>> +        goto out_prime_destroy;
>>>>> +    }
>>>>> +
>>>>>        return file;
>>>>>      out_prime_destroy:
>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>        if (dev->driver->postclose)
>>>>>            dev->driver->postclose(dev, file);
>>>>>    +    __free_page(file->dummy_page);
>>>>> +
>>>>>        drm_prime_destroy_file_private(&file->prime);
>>>>>          WARN_ON(!list_empty(&file->event_list));
>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>>> index 1de2cde..c482e9c 100644
>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>> drm_device *dev,
>>>>>          ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>                dma_buf, *handle);
>>>>> +
>>>>> +    if (!ret) {
>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>> +        if (!obj->dummy_page)
>>>>> +            ret = -ENOMEM;
>>>>> +    }
>>>>> +
>>>> While the per file case still looks acceptable this is a clear NAK
>>>> since it will massively increase the memory needed for a prime
>>>> exported object.
>>>>
>>>> I think that this is quite overkill in the first place and for the
>>>> hot unplug case we can just use the global dummy page as well.
>>>>
>>>> Christian.
>>>
>>> Global dummy page is good for read access, what do you do on write
>>> access ? My first approach was indeed to map at first global dummy
>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
>>> that this would trigger Copy On Write flow in core mm
>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=02%7C01%7CAndrey.Grodzovsky%40amd.com%7C3753451d037544e7495408d816d4c4ee%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637284450384586120&amp;sdata=ZpRaQgqA5K4jRfidOiedey0AleeYQ97WNUkGA29ERA0%3D&amp;reserved=0)
>>> on the next page fault to same address triggered by a write access but
>>> then i realized a new COW page will be allocated for each such mapping
>>> and this is much more wasteful then having a dedicated page per GEM
>>> object.
>> Yeah, but this is only for a very very small corner cases. What we need
>> to prevent is increasing the memory usage during normal operation to much.
>>
>> Using memory during the unplug is completely unproblematic because we
>> just released quite a bunch of it by releasing all those system memory
>> buffers.
>>
>> And I'm pretty sure that COWed pages are correctly accounted towards the
>> used memory of a process.
>>
>> So I think if that approach works as intended and the COW pages are
>> released again on unmapping it would be the perfect solution to the problem.
>>
>> Daniel what do you think?
> If COW works, sure sounds reasonable. And if we can make sure we
> managed to drop all the system allocations (otherwise suddenly 2x
> memory usage, worst case). But I have no idea whether we can
> retroshoehorn that into an established vma, you might have fun stuff
> like a mkwrite handler there (which I thought is the COW handler
> thing, but really no idea).


Can you clarify your concern here ? I see no DRM driver besides vmwgfx
who installs a handler for vm_operations_struct.page_mkwrite and in any
case, since I will be turning off VM_SHARED flag for the faulting vm_area_struct
making it a COW, page_mkwrite will not be called on any subsequent vm fault.

Andrey


>
> If we need to massively change stuff then I think rw dummy page,
> allocated on first fault after hotunplug (maybe just make it one per
> object, that's simplest) seems like the much safer option. Much less
> code that can go wrong.
> -Daniel
>
>> Regards,
>> Christian.
>>
>>> We can indeed optimize by allocating this dummy page on the first page
>>> fault after device disconnect instead on GEM object creation.
>>>
>>> Andrey
>>>
>>>
>>>>> mutex_unlock(&file_priv->prime.lock);
>>>>>        if (ret)
>>>>>            goto fail;
>>>>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct
>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>        dma_buf = attach->dmabuf;
>>>>>        dma_buf_detach(attach->dmabuf, attach);
>>>>> +
>>>>> +    __free_page(obj->dummy_page);
>>>>> +
>>>>>        /* remove the reference */
>>>>>        dma_buf_put(dma_buf);
>>>>>    }
>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>> index 19df802..349a658 100644
>>>>> --- a/include/drm/drm_file.h
>>>>> +++ b/include/drm/drm_file.h
>>>>> @@ -335,6 +335,8 @@ struct drm_file {
>>>>>         */
>>>>>        struct drm_prime_file_private prime;
>>>>>    +    struct page *dummy_page;
>>>>> +
>>>>>        /* private: */
>>>>>    #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>        unsigned long lock_count; /* DRI1 legacy lock count */
>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>> index 0b37506..47460d1 100644
>>>>> --- a/include/drm/drm_gem.h
>>>>> +++ b/include/drm/drm_gem.h
>>>>> @@ -310,6 +310,8 @@ struct drm_gem_object {
>>>>>         *
>>>>>         */
>>>>>        const struct drm_gem_object_funcs *funcs;
>>>>> +
>>>>> +    struct page *dummy_page;
>>>>>    };
>>>>>      /**
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page.
  2020-06-22  9:41   ` Daniel Vetter
  2020-06-24  3:31     ` Andrey Grodzovsky
@ 2020-11-10 17:41     ` Andrey Grodzovsky
  1 sibling, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-10 17:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


On 6/22/20 5:41 AM, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 02:03:02AM -0400, Andrey Grodzovsky wrote:
>> On device removal reroute all CPU mappings to dummy page per drm_file
>> instance or imported GEM object.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo_vm.c | 65 ++++++++++++++++++++++++++++++++++++-----
>>   1 file changed, 57 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> index 389128b..2f8bf5e 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> @@ -35,6 +35,8 @@
>>   #include <drm/ttm/ttm_bo_driver.h>
>>   #include <drm/ttm/ttm_placement.h>
>>   #include <drm/drm_vma_manager.h>
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_file.h>
>>   #include <linux/mm.h>
>>   #include <linux/pfn_t.h>
>>   #include <linux/rbtree.h>
>> @@ -328,19 +330,66 @@ vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
> Hm I think diff and code flow look a bit bad now. What about renaming the
> current function to __ttm_bo_vm_fault and then having something like the
> below:
>
> ttm_bo_vm_fault(args) {
>
> 	if (drm_dev_enter()) {
> 		__ttm_bo_vm_fault(args);
> 		drm_dev_exit();
> 	} else  {
> 		drm_gem_insert_dummy_pfn();
> 	}
> }
>
> I think drm_gem_insert_dummy_pfn(); should be portable across drivers, so
> another nice point to try to unifiy drivers as much as possible.
> -Daniel
>
>>   	pgprot_t prot;
>>   	struct ttm_buffer_object *bo = vma->vm_private_data;
>>   	vm_fault_t ret;
>> +	int idx;
>> +	struct drm_device *ddev = bo->base.dev;
>>   
>> -	ret = ttm_bo_vm_reserve(bo, vmf);
>> -	if (ret)
>> -		return ret;
>> +	if (drm_dev_enter(ddev, &idx)) {
>> +		ret = ttm_bo_vm_reserve(bo, vmf);
>> +		if (ret)
>> +			goto exit;
>> +
>> +		prot = vma->vm_page_prot;
>>   
>> -	prot = vma->vm_page_prot;
>> -	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
>> -	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
>> +		ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT);
>> +		if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
>> +			goto exit;
>> +
>> +		dma_resv_unlock(bo->base.resv);
>> +
>> +exit:
>> +		drm_dev_exit(idx);
>>   		return ret;
>> +	} else {
>>   
>> -	dma_resv_unlock(bo->base.resv);
>> +		struct drm_file *file = NULL;
>> +		struct page *dummy_page = NULL;
>> +		int handle;
>>   
>> -	return ret;
>> +		/* We are faulting on imported BO from dma_buf */
>> +		if (bo->base.dma_buf && bo->base.import_attach) {
>> +			dummy_page = bo->base.dummy_page;
>> +		/* We are faulting on non imported BO, find drm_file owning the BO*/
> Uh, we can't fish that out of the vma->vm_file pointer somehow? Or is that
> one all wrong? Doing this kind of list walk looks pretty horrible.
>
> If the vma doesn't have the right pointer I guess next option is that we
> store the drm_file page in gem_bo->dummy_page, and replace it on first
> export. But that's going to be tricky to track ...


For this one I hope to make all of this obsolete if Christian's suggestion from 
path 1/8 about mapping
global RO dummy page for read and COW on write will be possible to implement 
(testing
that indeed no memory usage explodes)

Andrey


>
>> +		} else {
>> +			struct drm_gem_object *gobj;
>> +
>> +			mutex_lock(&ddev->filelist_mutex);
>> +			list_for_each_entry(file, &ddev->filelist, lhead) {
>> +				spin_lock(&file->table_lock);
>> +				idr_for_each_entry(&file->object_idr, gobj, handle) {
>> +					if (gobj == &bo->base) {
>> +						dummy_page = file->dummy_page;
>> +						break;
>> +					}
>> +				}
>> +				spin_unlock(&file->table_lock);
>> +			}
>> +			mutex_unlock(&ddev->filelist_mutex);
>> +		}
>> +
>> +		if (dummy_page) {
>> +			/*
>> +			 * Let do_fault complete the PTE install e.t.c using vmf->page
>> +			 *
>> +			 * TODO - should i call free_page somewhere ?
> Nah, instead don't call get_page. The page will be around as long as
> there's a reference for the drm_file or gem_bo, which is longer than any
> mmap. Otherwise yes this would like really badly.
>
>> +			 */
>> +			get_page(dummy_page);
>> +			vmf->page = dummy_page;
>> +			return 0;
>> +		} else {
>> +			return VM_FAULT_SIGSEGV;
> Hm that would be a kernel bug, wouldn't it? WARN_ON() required here imo.
> -Daniel
>
>> +		}
>> +	}
>>   }
>>   EXPORT_SYMBOL(ttm_bo_vm_fault);
>>   
>> -- 
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-06-24  6:11                 ` Greg KH
  2020-06-25  1:52                   ` Andrey Grodzovsky
@ 2020-11-10 17:54                   ` Andrey Grodzovsky
  2020-11-10 17:59                     ` Greg KH
  1 sibling, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-10 17:54 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


[-- Attachment #1.1: Type: text/plain, Size: 842 bytes --]

Hi, back to this after a long context switch for some higher priority stuff.

So here I was able eventually to drop all this code and this change here 
https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=amd-staging-drm-next-device-unplug&id=61852c8a59b4dd89d637693552c73175b9f2ccd6
was enough for me. Seems like while device_remove_file can handle the use case 
where the file and the parent directory already gone, sysfs_remove_group goes 
down in flames in that case
due to kobj->sd being unset on device removal.

Andrey

On 6/24/20 2:11 AM, Greg KH wrote:
>>> But why are things being removed twice?
>> Not sure I understand what removed twice ? I remove only once per sysfs attribute.
> This code path shows that the kernel is trying to remove a file that is
> not present, so someone removed it already...
>
> thanks,
>
> gre k-h
>

[-- Attachment #1.2: Type: text/html, Size: 2196 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-11-10 17:54                   ` Andrey Grodzovsky
@ 2020-11-10 17:59                     ` Greg KH
  2020-11-11 15:13                       ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Greg KH @ 2020-11-10 17:59 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> Hi, back to this after a long context switch for some higher priority stuff.
> 
> So here I was able eventually to drop all this code and this change here https://cgit.freedesktop.org/~agrodzov/linux/commit/?h=amd-staging-drm-next-device-unplug&id=61852c8a59b4dd89d637693552c73175b9f2ccd6
> was enough for me. Seems like while device_remove_file can handle the use
> case where the file and the parent directory already gone,
> sysfs_remove_group goes down in flames in that case
> due to kobj->sd being unset on device removal.

A driver shouldn't ever have to remove individual sysfs groups, the
driver core/bus logic should do it for them automatically.

And whenever a driver calls a sysfs_* call, that's a hint that something
is not working properly.

Also, run your patch above through checkpatch.pl before submitting it :)

thanks,

greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-11-10 17:59                     ` Greg KH
@ 2020-11-11 15:13                       ` Andrey Grodzovsky
  2020-11-11 15:34                         ` Greg KH
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-11 15:13 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


On 11/10/20 12:59 PM, Greg KH wrote:
> On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
>> Hi, back to this after a long context switch for some higher priority stuff.
>>
>> So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C7ae9e5798c7648d6dbb908d885a22c58%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637406278875513811%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aoFIsBxpLC9tBZw3E%2B8IJlNqFSq6uRgEvvciaZ6B1iw%3D&amp;reserved=0
>> was enough for me. Seems like while device_remove_file can handle the use
>> case where the file and the parent directory already gone,
>> sysfs_remove_group goes down in flames in that case
>> due to kobj->sd being unset on device removal.
> A driver shouldn't ever have to remove individual sysfs groups, the
> driver core/bus logic should do it for them automatically.
>
> And whenever a driver calls a sysfs_* call, that's a hint that something
> is not working properly.



Do you mean that while the driver creates the groups and files explicitly from 
it's different
subsystems it should not explicitly remove each one of them because all of them 
should
be removed at once (and recursively) when the device is being removed ?

Andrey


>
> Also, run your patch above through checkpatch.pl before submitting it :)
>
> thanks,
>
> greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-11-11 15:13                       ` Andrey Grodzovsky
@ 2020-11-11 15:34                         ` Greg KH
  2020-11-11 15:45                           ` Andrey Grodzovsky
  2020-12-02 15:48                           ` Andrey Grodzovsky
  0 siblings, 2 replies; 97+ messages in thread
From: Greg KH @ 2020-11-11 15:34 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/10/20 12:59 PM, Greg KH wrote:
> > On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> > > Hi, back to this after a long context switch for some higher priority stuff.
> > > 
> > > So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C7ae9e5798c7648d6dbb908d885a22c58%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637406278875513811%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aoFIsBxpLC9tBZw3E%2B8IJlNqFSq6uRgEvvciaZ6B1iw%3D&amp;reserved=0
> > > was enough for me. Seems like while device_remove_file can handle the use
> > > case where the file and the parent directory already gone,
> > > sysfs_remove_group goes down in flames in that case
> > > due to kobj->sd being unset on device removal.
> > A driver shouldn't ever have to remove individual sysfs groups, the
> > driver core/bus logic should do it for them automatically.
> > 
> > And whenever a driver calls a sysfs_* call, that's a hint that something
> > is not working properly.
> 
> 
> 
> Do you mean that while the driver creates the groups and files explicitly
> from it's different subsystems it should not explicitly remove each
> one of them because all of them should be removed at once (and
> recursively) when the device is being removed ?

Individual drivers should never add groups/files in sysfs, the driver
core should do it properly for you if you have everything set up
properly.  And yes, the driver core will automatically remove them as
well.

Please use the default groups attribute for your bus/subsystem and this
will happen automagically.

thanks,

greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-11-11 15:34                         ` Greg KH
@ 2020-11-11 15:45                           ` Andrey Grodzovsky
  2020-11-11 16:06                             ` Greg KH
  2020-12-02 15:48                           ` Andrey Grodzovsky
  1 sibling, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-11 15:45 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


On 11/11/20 10:34 AM, Greg KH wrote:
> On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
>> On 11/10/20 12:59 PM, Greg KH wrote:
>>> On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
>>>> Hi, back to this after a long context switch for some higher priority stuff.
>>>>
>>>> So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9fbfecac94a340dfb68408d886571609%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637407055896651058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Ye8HJR1vidppcOBnlOgVu5GwKD2%2Bb5ztHbiI%2BubKKT0%3D&amp;reserved=0
>>>> was enough for me. Seems like while device_remove_file can handle the use
>>>> case where the file and the parent directory already gone,
>>>> sysfs_remove_group goes down in flames in that case
>>>> due to kobj->sd being unset on device removal.
>>> A driver shouldn't ever have to remove individual sysfs groups, the
>>> driver core/bus logic should do it for them automatically.
>>>
>>> And whenever a driver calls a sysfs_* call, that's a hint that something
>>> is not working properly.
>>
>>
>> Do you mean that while the driver creates the groups and files explicitly
>> from it's different subsystems it should not explicitly remove each
>> one of them because all of them should be removed at once (and
>> recursively) when the device is being removed ?
> Individual drivers should never add groups/files in sysfs, the driver
> core should do it properly for you if you have everything set up
> properly.  And yes, the driver core will automatically remove them as
> well.
>
> Please use the default groups attribute for your bus/subsystem and this
> will happen automagically.

Googling for default groups attributes i found this - 
https://www.linuxfoundation.org/blog/2013/06/how-to-create-a-sysfs-file-correctly/
Would this be what you suggest for us ? Specifically for our case the struct 
device's  groups  seems the right solution as different devices
might have slightly diffreent sysfs attributes.

Andrey


>
> thanks,
>
> greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-11-11 15:45                           ` Andrey Grodzovsky
@ 2020-11-11 16:06                             ` Greg KH
  2020-11-11 16:34                               ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Greg KH @ 2020-11-11 16:06 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Wed, Nov 11, 2020 at 10:45:53AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/11/20 10:34 AM, Greg KH wrote:
> > On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
> > > On 11/10/20 12:59 PM, Greg KH wrote:
> > > > On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> > > > > Hi, back to this after a long context switch for some higher priority stuff.
> > > > > 
> > > > > So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9fbfecac94a340dfb68408d886571609%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637407055896651058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Ye8HJR1vidppcOBnlOgVu5GwKD2%2Bb5ztHbiI%2BubKKT0%3D&amp;reserved=0
> > > > > was enough for me. Seems like while device_remove_file can handle the use
> > > > > case where the file and the parent directory already gone,
> > > > > sysfs_remove_group goes down in flames in that case
> > > > > due to kobj->sd being unset on device removal.
> > > > A driver shouldn't ever have to remove individual sysfs groups, the
> > > > driver core/bus logic should do it for them automatically.
> > > > 
> > > > And whenever a driver calls a sysfs_* call, that's a hint that something
> > > > is not working properly.
> > > 
> > > 
> > > Do you mean that while the driver creates the groups and files explicitly
> > > from it's different subsystems it should not explicitly remove each
> > > one of them because all of them should be removed at once (and
> > > recursively) when the device is being removed ?
> > Individual drivers should never add groups/files in sysfs, the driver
> > core should do it properly for you if you have everything set up
> > properly.  And yes, the driver core will automatically remove them as
> > well.
> > 
> > Please use the default groups attribute for your bus/subsystem and this
> > will happen automagically.
> 
> Googling for default groups attributes i found this - https://www.linuxfoundation.org/blog/2013/06/how-to-create-a-sysfs-file-correctly/

Odd, mirror of the original article:
	http://kroah.com/log/blog/2013/06/26/how-to-create-a-sysfs-file-correctly/

> Would this be what you suggest for us ? Specifically for our case the struct
> device's  groups  seems the right solution as different devices
> might have slightly diffreent sysfs attributes.

That's what the is_visable() callback in your attribute group is for, to
tell the kernel if an individual sysfs attribute should be created or
not.

thanks,

greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-11-11 16:06                             ` Greg KH
@ 2020-11-11 16:34                               ` Andrey Grodzovsky
  0 siblings, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-11 16:34 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


On 11/11/20 11:06 AM, Greg KH wrote:
> On Wed, Nov 11, 2020 at 10:45:53AM -0500, Andrey Grodzovsky wrote:
>> On 11/11/20 10:34 AM, Greg KH wrote:
>>> On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
>>>> On 11/10/20 12:59 PM, Greg KH wrote:
>>>>> On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
>>>>>> Hi, back to this after a long context switch for some higher priority stuff.
>>>>>>
>>>>>> So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3e019e2780114b696b4f08d8865bac36%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637407075579242822%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=E%2FIZmVeJDvHiY2xSaaPaay4mXN49EbhSJaJ4zlt6WKk%3D&amp;reserved=0
>>>>>> was enough for me. Seems like while device_remove_file can handle the use
>>>>>> case where the file and the parent directory already gone,
>>>>>> sysfs_remove_group goes down in flames in that case
>>>>>> due to kobj->sd being unset on device removal.
>>>>> A driver shouldn't ever have to remove individual sysfs groups, the
>>>>> driver core/bus logic should do it for them automatically.
>>>>>
>>>>> And whenever a driver calls a sysfs_* call, that's a hint that something
>>>>> is not working properly.
>>>>
>>>> Do you mean that while the driver creates the groups and files explicitly
>>>> from it's different subsystems it should not explicitly remove each
>>>> one of them because all of them should be removed at once (and
>>>> recursively) when the device is being removed ?
>>> Individual drivers should never add groups/files in sysfs, the driver
>>> core should do it properly for you if you have everything set up
>>> properly.  And yes, the driver core will automatically remove them as
>>> well.
>>>
>>> Please use the default groups attribute for your bus/subsystem and this
>>> will happen automagically.
>> Googling for default groups attributes i found this - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linuxfoundation.org%2Fblog%2F2013%2F06%2Fhow-to-create-a-sysfs-file-correctly%2F&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3e019e2780114b696b4f08d8865bac36%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637407075579252818%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=AVhdi%2BcKeFXM8CBv%2BhRNTCYX2XSS8oo0so6mB%2BPuEfk%3D&amp;reserved=0
> Odd, mirror of the original article:
> 	https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkroah.com%2Flog%2Fblog%2F2013%2F06%2F26%2Fhow-to-create-a-sysfs-file-correctly%2F&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3e019e2780114b696b4f08d8865bac36%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637407075579252818%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=lGMd3PJOWIlKUpvbV3Zz%2FvbBIRwz6%2BlJ%2BS%2BiVcXxuzM%3D&amp;reserved=0
>
>> Would this be what you suggest for us ? Specifically for our case the struct
>> device's  groups  seems the right solution as different devices
>> might have slightly diffreent sysfs attributes.
> That's what the is_visable() callback in your attribute group is for, to
> tell the kernel if an individual sysfs attribute should be created or
> not.

I see, this looks like a good improvement to our current way of managing sysfs. 
Since this
change is somewhat fundamental and requires good testing I prefer to deal with 
it separately from my current
work on device unplug and so I will put it on TODO right after finishing this work.

Andrey


>
> thanks,
>
> greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 4/8] drm/amdgpu: Split amdgpu_device_fini into early and late
  2020-06-22  9:48   ` Daniel Vetter
@ 2020-11-12  4:19     ` Andrey Grodzovsky
  2020-11-12  9:29       ` Daniel Vetter
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-12  4:19 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


On 6/22/20 5:48 AM, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 02:03:04AM -0400, Andrey Grodzovsky wrote:
>> Some of the stuff in amdgpu_device_fini such as HW interrupts
>> disable and pending fences finilization must be done right away on
>> pci_remove while most of the stuff which relates to finilizing and releasing
>> driver data structures can be kept until drm_driver.release hook is called, i.e.
>> when the last device reference is dropped.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> Long term I think best if as much of this code is converted over to devm
> (for hw stuff) and drmm (for sw stuff and allocations). Doing this all
> manually is very error prone.
>
> I've started various such patches and others followed, but thus far only
> very simple drivers tackled. But it should be doable step by step at
> least, so you should have incremental benefits in code complexity right
> away I hope.
> -Daniel


Sure, I will definitely add this to my TODOs for after landing (hopefully) this 
patch set (after a few more iterations)
as indeed the required changes for using devm and drmm are non trivial and I prefer
to avoid diverging here into multiple directions at once.

Andrey


>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++++++----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  6 ++----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 23 +++++++++++++++++------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
>>   7 files changed, 54 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 2a806cb..604a681 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1003,7 +1003,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>   		       struct drm_device *ddev,
>>   		       struct pci_dev *pdev,
>>   		       uint32_t flags);
>> -void amdgpu_device_fini(struct amdgpu_device *adev);
>> +void amdgpu_device_fini_early(struct amdgpu_device *adev);
>> +void amdgpu_device_fini_late(struct amdgpu_device *adev);
>> +
>>   int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
>>   
>>   void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
>> @@ -1188,6 +1190,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
>>   int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
>>   void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>   				 struct drm_file *file_priv);
>> +void amdgpu_driver_release_kms(struct drm_device *dev);
>> +
>>   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>>   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>>   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index cc41e8f..e7b9065 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2309,6 +2309,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
>>   {
>>   	int i, r;
>>   
>> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
>> +
>>   	amdgpu_ras_pre_fini(adev);
>>   
>>   	if (adev->gmc.xgmi.num_physical_nodes > 1)
>> @@ -3304,10 +3306,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>    * Tear down the driver info (all asics).
>>    * Called at driver shutdown.
>>    */
>> -void amdgpu_device_fini(struct amdgpu_device *adev)
>> +void amdgpu_device_fini_early(struct amdgpu_device *adev)
>>   {
>> -	int r;
>> -
>>   	DRM_INFO("amdgpu: finishing device.\n");
>>   	flush_delayed_work(&adev->delayed_init_work);
>>   	adev->shutdown = true;
>> @@ -3330,7 +3330,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>   	if (adev->pm_sysfs_en)
>>   		amdgpu_pm_sysfs_fini(adev);
>>   	amdgpu_fbdev_fini(adev);
>> -	r = amdgpu_device_ip_fini(adev);
>> +
>> +	amdgpu_irq_fini_early(adev);
>> +}
>> +
>> +void amdgpu_device_fini_late(struct amdgpu_device *adev)
>> +{
>> +	amdgpu_device_ip_fini(adev);
>>   	if (adev->firmware.gpu_info_fw) {
>>   		release_firmware(adev->firmware.gpu_info_fw);
>>   		adev->firmware.gpu_info_fw = NULL;
>> @@ -3368,6 +3374,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>   		amdgpu_pmu_fini(adev);
>>   	if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
>>   		amdgpu_discovery_fini(adev);
>> +
>>   }
>>   
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 9e5afa5..43592dc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -1134,12 +1134,9 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>>   {
>>   	struct drm_device *dev = pci_get_drvdata(pdev);
>>   
>> -#ifdef MODULE
>> -	if (THIS_MODULE->state != MODULE_STATE_GOING)
>> -#endif
>> -		DRM_ERROR("Hotplug removal is not supported\n");
>>   	drm_dev_unplug(dev);
>>   	amdgpu_driver_unload_kms(dev);
>> +
>>   	pci_disable_device(pdev);
>>   	pci_set_drvdata(pdev, NULL);
>>   	drm_dev_put(dev);
>> @@ -1445,6 +1442,7 @@ static struct drm_driver kms_driver = {
>>   	.dumb_create = amdgpu_mode_dumb_create,
>>   	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>>   	.fops = &amdgpu_driver_kms_fops,
>> +	.release = &amdgpu_driver_release_kms,
>>   
>>   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>>   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index 0cc4c67..1697655 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -49,6 +49,7 @@
>>   #include <drm/drm_irq.h>
>>   #include <drm/drm_vblank.h>
>>   #include <drm/amdgpu_drm.h>
>> +#include <drm/drm_drv.h>
>>   #include "amdgpu.h"
>>   #include "amdgpu_ih.h"
>>   #include "atom.h"
>> @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>   	return 0;
>>   }
>>   
>> +
>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
>> +{
>> +	if (adev->irq.installed) {
>> +		drm_irq_uninstall(adev->ddev);
>> +		adev->irq.installed = false;
>> +		if (adev->irq.msi_enabled)
>> +			pci_free_irq_vectors(adev->pdev);
>> +
>> +		if (!amdgpu_device_has_dc_support(adev))
>> +			flush_work(&adev->hotplug_work);
>> +	}
>> +}
>> +
>>   /**
>>    * amdgpu_irq_fini - shut down interrupt handling
>>    *
>> @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>>   {
>>   	unsigned i, j;
>>   
>> -	if (adev->irq.installed) {
>> -		drm_irq_uninstall(adev->ddev);
>> -		adev->irq.installed = false;
>> -		if (adev->irq.msi_enabled)
>> -			pci_free_irq_vectors(adev->pdev);
>> -		if (!amdgpu_device_has_dc_support(adev))
>> -			flush_work(&adev->hotplug_work);
>> -	}
>> -
>>   	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>>   		if (!adev->irq.client[i].sources)
>>   			continue;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> index c718e94..718c70f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
>>   
>>   int amdgpu_irq_init(struct amdgpu_device *adev);
>>   void amdgpu_irq_fini(struct amdgpu_device *adev);
>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
>>   int amdgpu_irq_add_id(struct amdgpu_device *adev,
>>   		      unsigned client_id, unsigned src_id,
>>   		      struct amdgpu_irq_src *source);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index c0b1904..9d0af22 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -29,6 +29,7 @@
>>   #include "amdgpu.h"
>>   #include <drm/drm_debugfs.h>
>>   #include <drm/amdgpu_drm.h>
>> +#include <drm/drm_drv.h>
>>   #include "amdgpu_sched.h"
>>   #include "amdgpu_uvd.h"
>>   #include "amdgpu_vce.h"
>> @@ -86,7 +87,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>   	amdgpu_unregister_gpu_instance(adev);
>>   
>>   	if (adev->rmmio == NULL)
>> -		goto done_free;
>> +		return;
>>   
>>   	if (adev->runpm) {
>>   		pm_runtime_get_sync(dev->dev);
>> @@ -95,11 +96,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>   
>>   	amdgpu_acpi_fini(adev);
>>   
>> -	amdgpu_device_fini(adev);
>> -
>> -done_free:
>> -	kfree(adev);
>> -	dev->dev_private = NULL;
>> +	amdgpu_device_fini_early(adev);
>>   }
>>   
>>   void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
>> @@ -1108,6 +1105,20 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>   	pm_runtime_put_autosuspend(dev->dev);
>>   }
>>   
>> +
>> +void amdgpu_driver_release_kms (struct drm_device *dev)
>> +{
>> +	struct amdgpu_device *adev = dev->dev_private;
>> +
>> +	amdgpu_device_fini_late(adev);
>> +
>> +	kfree(adev);
>> +	dev->dev_private = NULL;
>> +
>> +	drm_dev_fini(dev);
>> +	kfree(dev);
>> +}
>> +
>>   /*
>>    * VBlank related functions.
>>    */
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> index 7348619..169c2239 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> @@ -2056,9 +2056,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
>>   {
>>   	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>>   
>> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
>> +
>>   	if (!con)
>>   		return 0;
>>   
>> +
>>   	/* Need disable ras on all IPs here before ip [hw/sw]fini */
>>   	amdgpu_ras_disable_all_features(adev, 0);
>>   	amdgpu_ras_recovery_fini(adev);
>> -- 
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 4/8] drm/amdgpu: Split amdgpu_device_fini into early and late
  2020-11-12  4:19     ` Andrey Grodzovsky
@ 2020-11-12  9:29       ` Daniel Vetter
  0 siblings, 0 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-11-12  9:29 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Wed, Nov 11, 2020 at 11:19:04PM -0500, Andrey Grodzovsky wrote:
> 
> On 6/22/20 5:48 AM, Daniel Vetter wrote:
> > On Sun, Jun 21, 2020 at 02:03:04AM -0400, Andrey Grodzovsky wrote:
> > > Some of the stuff in amdgpu_device_fini such as HW interrupts
> > > disable and pending fences finilization must be done right away on
> > > pci_remove while most of the stuff which relates to finilizing and releasing
> > > driver data structures can be kept until drm_driver.release hook is called, i.e.
> > > when the last device reference is dropped.
> > > 
> > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > Long term I think best if as much of this code is converted over to devm
> > (for hw stuff) and drmm (for sw stuff and allocations). Doing this all
> > manually is very error prone.
> > 
> > I've started various such patches and others followed, but thus far only
> > very simple drivers tackled. But it should be doable step by step at
> > least, so you should have incremental benefits in code complexity right
> > away I hope.
> > -Daniel
> 
> 
> Sure, I will definitely add this to my TODOs for after landing (hopefully)
> this patch set (after a few more iterations)
> as indeed the required changes for using devm and drmm are non trivial and I prefer
> to avoid diverging here into multiple directions at once.

For the display side there's a very nice patch series from Philip Zabel
pending:

https://lore.kernel.org/dri-devel/20200911135724.25833-1-p.zabel@pengutronix.de/

I think you'll want to use this. It's not landed yet, so a nudge from
someone else also using it would help I think.

Cheers, Daniel

> 
> Andrey
> 
> 
> > 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++++++++++----
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  6 ++----
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 23 +++++++++++++++++------
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
> > >   7 files changed, 54 insertions(+), 24 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > index 2a806cb..604a681 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > @@ -1003,7 +1003,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> > >   		       struct drm_device *ddev,
> > >   		       struct pci_dev *pdev,
> > >   		       uint32_t flags);
> > > -void amdgpu_device_fini(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_late(struct amdgpu_device *adev);
> > > +
> > >   int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
> > >   void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
> > > @@ -1188,6 +1190,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
> > >   int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
> > >   void amdgpu_driver_postclose_kms(struct drm_device *dev,
> > >   				 struct drm_file *file_priv);
> > > +void amdgpu_driver_release_kms(struct drm_device *dev);
> > > +
> > >   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
> > >   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
> > >   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index cc41e8f..e7b9065 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -2309,6 +2309,8 @@ static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
> > >   {
> > >   	int i, r;
> > > +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
> > > +
> > >   	amdgpu_ras_pre_fini(adev);
> > >   	if (adev->gmc.xgmi.num_physical_nodes > 1)
> > > @@ -3304,10 +3306,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> > >    * Tear down the driver info (all asics).
> > >    * Called at driver shutdown.
> > >    */
> > > -void amdgpu_device_fini(struct amdgpu_device *adev)
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev)
> > >   {
> > > -	int r;
> > > -
> > >   	DRM_INFO("amdgpu: finishing device.\n");
> > >   	flush_delayed_work(&adev->delayed_init_work);
> > >   	adev->shutdown = true;
> > > @@ -3330,7 +3330,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
> > >   	if (adev->pm_sysfs_en)
> > >   		amdgpu_pm_sysfs_fini(adev);
> > >   	amdgpu_fbdev_fini(adev);
> > > -	r = amdgpu_device_ip_fini(adev);
> > > +
> > > +	amdgpu_irq_fini_early(adev);
> > > +}
> > > +
> > > +void amdgpu_device_fini_late(struct amdgpu_device *adev)
> > > +{
> > > +	amdgpu_device_ip_fini(adev);
> > >   	if (adev->firmware.gpu_info_fw) {
> > >   		release_firmware(adev->firmware.gpu_info_fw);
> > >   		adev->firmware.gpu_info_fw = NULL;
> > > @@ -3368,6 +3374,7 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
> > >   		amdgpu_pmu_fini(adev);
> > >   	if (amdgpu_discovery && adev->asic_type >= CHIP_NAVI10)
> > >   		amdgpu_discovery_fini(adev);
> > > +
> > >   }
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index 9e5afa5..43592dc 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -1134,12 +1134,9 @@ amdgpu_pci_remove(struct pci_dev *pdev)
> > >   {
> > >   	struct drm_device *dev = pci_get_drvdata(pdev);
> > > -#ifdef MODULE
> > > -	if (THIS_MODULE->state != MODULE_STATE_GOING)
> > > -#endif
> > > -		DRM_ERROR("Hotplug removal is not supported\n");
> > >   	drm_dev_unplug(dev);
> > >   	amdgpu_driver_unload_kms(dev);
> > > +
> > >   	pci_disable_device(pdev);
> > >   	pci_set_drvdata(pdev, NULL);
> > >   	drm_dev_put(dev);
> > > @@ -1445,6 +1442,7 @@ static struct drm_driver kms_driver = {
> > >   	.dumb_create = amdgpu_mode_dumb_create,
> > >   	.dumb_map_offset = amdgpu_mode_dumb_mmap,
> > >   	.fops = &amdgpu_driver_kms_fops,
> > > +	.release = &amdgpu_driver_release_kms,
> > >   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > >   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > > index 0cc4c67..1697655 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > > @@ -49,6 +49,7 @@
> > >   #include <drm/drm_irq.h>
> > >   #include <drm/drm_vblank.h>
> > >   #include <drm/amdgpu_drm.h>
> > > +#include <drm/drm_drv.h>
> > >   #include "amdgpu.h"
> > >   #include "amdgpu_ih.h"
> > >   #include "atom.h"
> > > @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
> > >   	return 0;
> > >   }
> > > +
> > > +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
> > > +{
> > > +	if (adev->irq.installed) {
> > > +		drm_irq_uninstall(adev->ddev);
> > > +		adev->irq.installed = false;
> > > +		if (adev->irq.msi_enabled)
> > > +			pci_free_irq_vectors(adev->pdev);
> > > +
> > > +		if (!amdgpu_device_has_dc_support(adev))
> > > +			flush_work(&adev->hotplug_work);
> > > +	}
> > > +}
> > > +
> > >   /**
> > >    * amdgpu_irq_fini - shut down interrupt handling
> > >    *
> > > @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
> > >   {
> > >   	unsigned i, j;
> > > -	if (adev->irq.installed) {
> > > -		drm_irq_uninstall(adev->ddev);
> > > -		adev->irq.installed = false;
> > > -		if (adev->irq.msi_enabled)
> > > -			pci_free_irq_vectors(adev->pdev);
> > > -		if (!amdgpu_device_has_dc_support(adev))
> > > -			flush_work(&adev->hotplug_work);
> > > -	}
> > > -
> > >   	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
> > >   		if (!adev->irq.client[i].sources)
> > >   			continue;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> > > index c718e94..718c70f 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> > > @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
> > >   int amdgpu_irq_init(struct amdgpu_device *adev);
> > >   void amdgpu_irq_fini(struct amdgpu_device *adev);
> > > +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
> > >   int amdgpu_irq_add_id(struct amdgpu_device *adev,
> > >   		      unsigned client_id, unsigned src_id,
> > >   		      struct amdgpu_irq_src *source);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > > index c0b1904..9d0af22 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > > @@ -29,6 +29,7 @@
> > >   #include "amdgpu.h"
> > >   #include <drm/drm_debugfs.h>
> > >   #include <drm/amdgpu_drm.h>
> > > +#include <drm/drm_drv.h>
> > >   #include "amdgpu_sched.h"
> > >   #include "amdgpu_uvd.h"
> > >   #include "amdgpu_vce.h"
> > > @@ -86,7 +87,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
> > >   	amdgpu_unregister_gpu_instance(adev);
> > >   	if (adev->rmmio == NULL)
> > > -		goto done_free;
> > > +		return;
> > >   	if (adev->runpm) {
> > >   		pm_runtime_get_sync(dev->dev);
> > > @@ -95,11 +96,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
> > >   	amdgpu_acpi_fini(adev);
> > > -	amdgpu_device_fini(adev);
> > > -
> > > -done_free:
> > > -	kfree(adev);
> > > -	dev->dev_private = NULL;
> > > +	amdgpu_device_fini_early(adev);
> > >   }
> > >   void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
> > > @@ -1108,6 +1105,20 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
> > >   	pm_runtime_put_autosuspend(dev->dev);
> > >   }
> > > +
> > > +void amdgpu_driver_release_kms (struct drm_device *dev)
> > > +{
> > > +	struct amdgpu_device *adev = dev->dev_private;
> > > +
> > > +	amdgpu_device_fini_late(adev);
> > > +
> > > +	kfree(adev);
> > > +	dev->dev_private = NULL;
> > > +
> > > +	drm_dev_fini(dev);
> > > +	kfree(dev);
> > > +}
> > > +
> > >   /*
> > >    * VBlank related functions.
> > >    */
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > index 7348619..169c2239 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > @@ -2056,9 +2056,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
> > >   {
> > >   	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
> > > +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
> > > +
> > >   	if (!con)
> > >   		return 0;
> > > +
> > >   	/* Need disable ras on all IPs here before ip [hw/sw]fini */
> > >   	amdgpu_ras_disable_all_features(adev, 0);
> > >   	amdgpu_ras_recovery_fini(adev);
> > > -- 
> > > 2.7.4
> > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22 17:50         ` Daniel Vetter
  2020-11-09 20:53           ` Andrey Grodzovsky
@ 2020-11-13 20:52           ` Andrey Grodzovsky
  2020-11-14  8:41             ` Christian König
  1 sibling, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-13 20:52 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Alex Deucher, Michel Dänzer, Pekka Paalanen, dri-devel,
	amd-gfx list


On 6/22/20 1:50 PM, Daniel Vetter wrote:
> On Mon, Jun 22, 2020 at 7:45 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>> device is removed.
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>    include/drm/drm_file.h      |  2 ++
>>>>>    include/drm/drm_gem.h       |  2 ++
>>>>>    4 files changed, 22 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>> index c4c704e..67c0770 100644
>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>> drm_minor *minor)
>>>>>                goto out_prime_destroy;
>>>>>        }
>>>>>    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>> +    if (!file->dummy_page) {
>>>>> +        ret = -ENOMEM;
>>>>> +        goto out_prime_destroy;
>>>>> +    }
>>>>> +
>>>>>        return file;
>>>>>      out_prime_destroy:
>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>        if (dev->driver->postclose)
>>>>>            dev->driver->postclose(dev, file);
>>>>>    +    __free_page(file->dummy_page);
>>>>> +
>>>>>        drm_prime_destroy_file_private(&file->prime);
>>>>>          WARN_ON(!list_empty(&file->event_list));
>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>>> index 1de2cde..c482e9c 100644
>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>> drm_device *dev,
>>>>>          ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>                dma_buf, *handle);
>>>>> +
>>>>> +    if (!ret) {
>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>> +        if (!obj->dummy_page)
>>>>> +            ret = -ENOMEM;
>>>>> +    }
>>>>> +
>>>> While the per file case still looks acceptable this is a clear NAK
>>>> since it will massively increase the memory needed for a prime
>>>> exported object.
>>>>
>>>> I think that this is quite overkill in the first place and for the
>>>> hot unplug case we can just use the global dummy page as well.
>>>>
>>>> Christian.
>>>
>>> Global dummy page is good for read access, what do you do on write
>>> access ? My first approach was indeed to map at first global dummy
>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
>>> that this would trigger Copy On Write flow in core mm
>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=02%7C01%7CAndrey.Grodzovsky%40amd.com%7C3753451d037544e7495408d816d4c4ee%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637284450384586120&amp;sdata=ZpRaQgqA5K4jRfidOiedey0AleeYQ97WNUkGA29ERA0%3D&amp;reserved=0)
>>> on the next page fault to same address triggered by a write access but
>>> then i realized a new COW page will be allocated for each such mapping
>>> and this is much more wasteful then having a dedicated page per GEM
>>> object.
>> Yeah, but this is only for a very very small corner cases. What we need
>> to prevent is increasing the memory usage during normal operation to much.
>>
>> Using memory during the unplug is completely unproblematic because we
>> just released quite a bunch of it by releasing all those system memory
>> buffers.
>>
>> And I'm pretty sure that COWed pages are correctly accounted towards the
>> used memory of a process.
>>
>> So I think if that approach works as intended and the COW pages are
>> released again on unmapping it would be the perfect solution to the problem.
>>
>> Daniel what do you think?
> If COW works, sure sounds reasonable. And if we can make sure we
> managed to drop all the system allocations (otherwise suddenly 2x
> memory usage, worst case). But I have no idea whether we can
> retroshoehorn that into an established vma, you might have fun stuff
> like a mkwrite handler there (which I thought is the COW handler
> thing, but really no idea).
>
> If we need to massively change stuff then I think rw dummy page,
> allocated on first fault after hotunplug (maybe just make it one per
> object, that's simplest) seems like the much safer option. Much less
> code that can go wrong.
> -Daniel


Regarding COW, i was looking into how to properly implement it from within the 
fault handler (i.e. ttm_bo_vm_fault)
and the main obstacle I hit is that of exclusive access to the vm_area_struct, i 
need to be able to modify
vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can be 
triggered on subsequent write access
fault (here https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128)
but core mm takes only read side mm_sem (here for example 
https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#L488)
and so I am not supposed to modify vm_area_struct in this case. I am not sure if 
it's legit to write lock tthe mm_sem from this point.
I found some discussions about this here 
http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it wasn't really 
clear to me
what's the solution.

In any case, seems to me that easier and more memory saving solution would be to 
just switch to per ttm bo dumy rw page that
would be allocated on demand as you suggested here.  This should also take care 
of imported BOs and flink cases.
Then i can drop the per device FD and per GEM object FD dummy BO and the ugly 
loop i am using in patch 2 to match faulting BO to the right dummy page.

Does this makes sense ?

Andrey


>
>> Regards,
>> Christian.
>>
>>> We can indeed optimize by allocating this dummy page on the first page
>>> fault after device disconnect instead on GEM object creation.
>>>
>>> Andrey
>>>
>>>
>>>>> mutex_unlock(&file_priv->prime.lock);
>>>>>        if (ret)
>>>>>            goto fail;
>>>>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct
>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>        dma_buf = attach->dmabuf;
>>>>>        dma_buf_detach(attach->dmabuf, attach);
>>>>> +
>>>>> +    __free_page(obj->dummy_page);
>>>>> +
>>>>>        /* remove the reference */
>>>>>        dma_buf_put(dma_buf);
>>>>>    }
>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>> index 19df802..349a658 100644
>>>>> --- a/include/drm/drm_file.h
>>>>> +++ b/include/drm/drm_file.h
>>>>> @@ -335,6 +335,8 @@ struct drm_file {
>>>>>         */
>>>>>        struct drm_prime_file_private prime;
>>>>>    +    struct page *dummy_page;
>>>>> +
>>>>>        /* private: */
>>>>>    #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>        unsigned long lock_count; /* DRI1 legacy lock count */
>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>> index 0b37506..47460d1 100644
>>>>> --- a/include/drm/drm_gem.h
>>>>> +++ b/include/drm/drm_gem.h
>>>>> @@ -310,6 +310,8 @@ struct drm_gem_object {
>>>>>         *
>>>>>         */
>>>>>        const struct drm_gem_object_funcs *funcs;
>>>>> +
>>>>> +    struct page *dummy_page;
>>>>>    };
>>>>>      /**
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-13 20:52           ` Andrey Grodzovsky
@ 2020-11-14  8:41             ` Christian König
  2020-11-14  9:51               ` Daniel Vetter
  0 siblings, 1 reply; 97+ messages in thread
From: Christian König @ 2020-11-14  8:41 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter, Christian König
  Cc: Alex Deucher, Michel Dänzer, amd-gfx list, Pekka Paalanen,
	dri-devel

Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
>
> On 6/22/20 1:50 PM, Daniel Vetter wrote:
>> On Mon, Jun 22, 2020 at 7:45 PM Christian König
>> <christian.koenig@amd.com> wrote:
>>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>> device is removed.
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>    include/drm/drm_file.h      |  2 ++
>>>>>>    include/drm/drm_gem.h       |  2 ++
>>>>>>    4 files changed, 22 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>> index c4c704e..67c0770 100644
>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>>> drm_minor *minor)
>>>>>>                goto out_prime_destroy;
>>>>>>        }
>>>>>>    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>> +    if (!file->dummy_page) {
>>>>>> +        ret = -ENOMEM;
>>>>>> +        goto out_prime_destroy;
>>>>>> +    }
>>>>>> +
>>>>>>        return file;
>>>>>>      out_prime_destroy:
>>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>        if (dev->driver->postclose)
>>>>>>            dev->driver->postclose(dev, file);
>>>>>>    +    __free_page(file->dummy_page);
>>>>>> +
>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>          WARN_ON(!list_empty(&file->event_list));
>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c 
>>>>>> b/drivers/gpu/drm/drm_prime.c
>>>>>> index 1de2cde..c482e9c 100644
>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>>> drm_device *dev,
>>>>>>          ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>                dma_buf, *handle);
>>>>>> +
>>>>>> +    if (!ret) {
>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>> +        if (!obj->dummy_page)
>>>>>> +            ret = -ENOMEM;
>>>>>> +    }
>>>>>> +
>>>>> While the per file case still looks acceptable this is a clear NAK
>>>>> since it will massively increase the memory needed for a prime
>>>>> exported object.
>>>>>
>>>>> I think that this is quite overkill in the first place and for the
>>>>> hot unplug case we can just use the global dummy page as well.
>>>>>
>>>>> Christian.
>>>>
>>>> Global dummy page is good for read access, what do you do on write
>>>> access ? My first approach was indeed to map at first global dummy
>>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
>>>> that this would trigger Copy On Write flow in core mm
>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=02%7C01%7CAndrey.Grodzovsky%40amd.com%7C3753451d037544e7495408d816d4c4ee%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637284450384586120&amp;sdata=ZpRaQgqA5K4jRfidOiedey0AleeYQ97WNUkGA29ERA0%3D&amp;reserved=0) 
>>>>
>>>> on the next page fault to same address triggered by a write access but
>>>> then i realized a new COW page will be allocated for each such mapping
>>>> and this is much more wasteful then having a dedicated page per GEM
>>>> object.
>>> Yeah, but this is only for a very very small corner cases. What we need
>>> to prevent is increasing the memory usage during normal operation to 
>>> much.
>>>
>>> Using memory during the unplug is completely unproblematic because we
>>> just released quite a bunch of it by releasing all those system memory
>>> buffers.
>>>
>>> And I'm pretty sure that COWed pages are correctly accounted towards 
>>> the
>>> used memory of a process.
>>>
>>> So I think if that approach works as intended and the COW pages are
>>> released again on unmapping it would be the perfect solution to the 
>>> problem.
>>>
>>> Daniel what do you think?
>> If COW works, sure sounds reasonable. And if we can make sure we
>> managed to drop all the system allocations (otherwise suddenly 2x
>> memory usage, worst case). But I have no idea whether we can
>> retroshoehorn that into an established vma, you might have fun stuff
>> like a mkwrite handler there (which I thought is the COW handler
>> thing, but really no idea).
>>
>> If we need to massively change stuff then I think rw dummy page,
>> allocated on first fault after hotunplug (maybe just make it one per
>> object, that's simplest) seems like the much safer option. Much less
>> code that can go wrong.
>> -Daniel
>
>
> Regarding COW, i was looking into how to properly implement it from 
> within the fault handler (i.e. ttm_bo_vm_fault)
> and the main obstacle I hit is that of exclusive access to the 
> vm_area_struct, i need to be able to modify
> vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can 
> be triggered on subsequent write access
> fault (here 
> https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128)
> but core mm takes only read side mm_sem (here for example 
> https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#L488)
> and so I am not supposed to modify vm_area_struct in this case. I am 
> not sure if it's legit to write lock tthe mm_sem from this point.
> I found some discussions about this here 
> http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it 
> wasn't really clear to me
> what's the solution.
>
> In any case, seems to me that easier and more memory saving solution 
> would be to just switch to per ttm bo dumy rw page that
> would be allocated on demand as you suggested here.  This should also 
> take care of imported BOs and flink cases.
> Then i can drop the per device FD and per GEM object FD dummy BO and 
> the ugly loop i am using in patch 2 to match faulting BO to the right 
> dummy page.
>
> Does this makes sense ?

I still don't see the information leak as much of a problem, but if 
Daniel insists we should probably do this.

But could we at least have only one page per client instead of per BO?

Thanks,
Christian.

>
> Andrey
>
>
>>
>>> Regards,
>>> Christian.
>>>
>>>> We can indeed optimize by allocating this dummy page on the first page
>>>> fault after device disconnect instead on GEM object creation.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>> mutex_unlock(&file_priv->prime.lock);
>>>>>>        if (ret)
>>>>>>            goto fail;
>>>>>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct
>>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>>            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>>        dma_buf = attach->dmabuf;
>>>>>>        dma_buf_detach(attach->dmabuf, attach);
>>>>>> +
>>>>>> +    __free_page(obj->dummy_page);
>>>>>> +
>>>>>>        /* remove the reference */
>>>>>>        dma_buf_put(dma_buf);
>>>>>>    }
>>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>>> index 19df802..349a658 100644
>>>>>> --- a/include/drm/drm_file.h
>>>>>> +++ b/include/drm/drm_file.h
>>>>>> @@ -335,6 +335,8 @@ struct drm_file {
>>>>>>         */
>>>>>>        struct drm_prime_file_private prime;
>>>>>>    +    struct page *dummy_page;
>>>>>> +
>>>>>>        /* private: */
>>>>>>    #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>>        unsigned long lock_count; /* DRI1 legacy lock count */
>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>>> index 0b37506..47460d1 100644
>>>>>> --- a/include/drm/drm_gem.h
>>>>>> +++ b/include/drm/drm_gem.h
>>>>>> @@ -310,6 +310,8 @@ struct drm_gem_object {
>>>>>>         *
>>>>>>         */
>>>>>>        const struct drm_gem_object_funcs *funcs;
>>>>>> +
>>>>>> +    struct page *dummy_page;
>>>>>>    };
>>>>>>      /**
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-14  8:41             ` Christian König
@ 2020-11-14  9:51               ` Daniel Vetter
  2020-11-14  9:57                 ` Daniel Vetter
  2020-11-15  6:34                 ` Andrey Grodzovsky
  0 siblings, 2 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-11-14  9:51 UTC (permalink / raw)
  To: Christian König
  Cc: Andrey Grodzovsky, Michel Dänzer, amd-gfx list,
	Pekka Paalanen, dri-devel, Alex Deucher

On Sat, Nov 14, 2020 at 9:41 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
> >
> > On 6/22/20 1:50 PM, Daniel Vetter wrote:
> >> On Mon, Jun 22, 2020 at 7:45 PM Christian König
> >> <christian.koenig@amd.com> wrote:
> >>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
> >>>> On 6/22/20 9:18 AM, Christian König wrote:
> >>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> >>>>>> Will be used to reroute CPU mapped BO's page faults once
> >>>>>> device is removed.
> >>>>>>
> >>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>>> ---
> >>>>>>    drivers/gpu/drm/drm_file.c  |  8 ++++++++
> >>>>>>    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> >>>>>>    include/drm/drm_file.h      |  2 ++
> >>>>>>    include/drm/drm_gem.h       |  2 ++
> >>>>>>    4 files changed, 22 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> >>>>>> index c4c704e..67c0770 100644
> >>>>>> --- a/drivers/gpu/drm/drm_file.c
> >>>>>> +++ b/drivers/gpu/drm/drm_file.c
> >>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
> >>>>>> drm_minor *minor)
> >>>>>>                goto out_prime_destroy;
> >>>>>>        }
> >>>>>>    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> >>>>>> +    if (!file->dummy_page) {
> >>>>>> +        ret = -ENOMEM;
> >>>>>> +        goto out_prime_destroy;
> >>>>>> +    }
> >>>>>> +
> >>>>>>        return file;
> >>>>>>      out_prime_destroy:
> >>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
> >>>>>>        if (dev->driver->postclose)
> >>>>>>            dev->driver->postclose(dev, file);
> >>>>>>    +    __free_page(file->dummy_page);
> >>>>>> +
> >>>>>> drm_prime_destroy_file_private(&file->prime);
> >>>>>>          WARN_ON(!list_empty(&file->event_list));
> >>>>>> diff --git a/drivers/gpu/drm/drm_prime.c
> >>>>>> b/drivers/gpu/drm/drm_prime.c
> >>>>>> index 1de2cde..c482e9c 100644
> >>>>>> --- a/drivers/gpu/drm/drm_prime.c
> >>>>>> +++ b/drivers/gpu/drm/drm_prime.c
> >>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
> >>>>>> drm_device *dev,
> >>>>>>          ret = drm_prime_add_buf_handle(&file_priv->prime,
> >>>>>>                dma_buf, *handle);
> >>>>>> +
> >>>>>> +    if (!ret) {
> >>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> >>>>>> +        if (!obj->dummy_page)
> >>>>>> +            ret = -ENOMEM;
> >>>>>> +    }
> >>>>>> +
> >>>>> While the per file case still looks acceptable this is a clear NAK
> >>>>> since it will massively increase the memory needed for a prime
> >>>>> exported object.
> >>>>>
> >>>>> I think that this is quite overkill in the first place and for the
> >>>>> hot unplug case we can just use the global dummy page as well.
> >>>>>
> >>>>> Christian.
> >>>>
> >>>> Global dummy page is good for read access, what do you do on write
> >>>> access ? My first approach was indeed to map at first global dummy
> >>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
> >>>> that this would trigger Copy On Write flow in core mm
> >>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=02%7C01%7CAndrey.Grodzovsky%40amd.com%7C3753451d037544e7495408d816d4c4ee%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637284450384586120&amp;sdata=ZpRaQgqA5K4jRfidOiedey0AleeYQ97WNUkGA29ERA0%3D&amp;reserved=0)
> >>>>
> >>>> on the next page fault to same address triggered by a write access but
> >>>> then i realized a new COW page will be allocated for each such mapping
> >>>> and this is much more wasteful then having a dedicated page per GEM
> >>>> object.
> >>> Yeah, but this is only for a very very small corner cases. What we need
> >>> to prevent is increasing the memory usage during normal operation to
> >>> much.
> >>>
> >>> Using memory during the unplug is completely unproblematic because we
> >>> just released quite a bunch of it by releasing all those system memory
> >>> buffers.
> >>>
> >>> And I'm pretty sure that COWed pages are correctly accounted towards
> >>> the
> >>> used memory of a process.
> >>>
> >>> So I think if that approach works as intended and the COW pages are
> >>> released again on unmapping it would be the perfect solution to the
> >>> problem.
> >>>
> >>> Daniel what do you think?
> >> If COW works, sure sounds reasonable. And if we can make sure we
> >> managed to drop all the system allocations (otherwise suddenly 2x
> >> memory usage, worst case). But I have no idea whether we can
> >> retroshoehorn that into an established vma, you might have fun stuff
> >> like a mkwrite handler there (which I thought is the COW handler
> >> thing, but really no idea).
> >>
> >> If we need to massively change stuff then I think rw dummy page,
> >> allocated on first fault after hotunplug (maybe just make it one per
> >> object, that's simplest) seems like the much safer option. Much less
> >> code that can go wrong.
> >> -Daniel
> >
> >
> > Regarding COW, i was looking into how to properly implement it from
> > within the fault handler (i.e. ttm_bo_vm_fault)
> > and the main obstacle I hit is that of exclusive access to the
> > vm_area_struct, i need to be able to modify
> > vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can
> > be triggered on subsequent write access
> > fault (here
> > https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128)
> > but core mm takes only read side mm_sem (here for example
> > https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#L488)
> > and so I am not supposed to modify vm_area_struct in this case. I am
> > not sure if it's legit to write lock tthe mm_sem from this point.
> > I found some discussions about this here
> > http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it
> > wasn't really clear to me
> > what's the solution.
> >
> > In any case, seems to me that easier and more memory saving solution
> > would be to just switch to per ttm bo dumy rw page that
> > would be allocated on demand as you suggested here.  This should also
> > take care of imported BOs and flink cases.
> > Then i can drop the per device FD and per GEM object FD dummy BO and
> > the ugly loop i am using in patch 2 to match faulting BO to the right
> > dummy page.
> >
> > Does this makes sense ?
>
> I still don't see the information leak as much of a problem, but if
> Daniel insists we should probably do this.

Well amdgpu doesn't clear buffers by default, so indeed you guys are a
lot more laissez-faire here. But in general we really don't do that
kind of leaking. Iirc there's even radeonsi bugs because else clears,
and radeonsi happily displays gunk :-)

> But could we at least have only one page per client instead of per BO?

I think you can do one page per file descriptor or something like
that. But gets annoying with shared bo, especially with dma_buf_mmap
forwarding.
-Daniel

>
> Thanks,
> Christian.
>
> >
> > Andrey
> >
> >
> >>
> >>> Regards,
> >>> Christian.
> >>>
> >>>> We can indeed optimize by allocating this dummy page on the first page
> >>>> fault after device disconnect instead on GEM object creation.
> >>>>
> >>>> Andrey
> >>>>
> >>>>
> >>>>>> mutex_unlock(&file_priv->prime.lock);
> >>>>>>        if (ret)
> >>>>>>            goto fail;
> >>>>>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct
> >>>>>> drm_gem_object *obj, struct sg_table *sg)
> >>>>>>            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
> >>>>>>        dma_buf = attach->dmabuf;
> >>>>>>        dma_buf_detach(attach->dmabuf, attach);
> >>>>>> +
> >>>>>> +    __free_page(obj->dummy_page);
> >>>>>> +
> >>>>>>        /* remove the reference */
> >>>>>>        dma_buf_put(dma_buf);
> >>>>>>    }
> >>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> >>>>>> index 19df802..349a658 100644
> >>>>>> --- a/include/drm/drm_file.h
> >>>>>> +++ b/include/drm/drm_file.h
> >>>>>> @@ -335,6 +335,8 @@ struct drm_file {
> >>>>>>         */
> >>>>>>        struct drm_prime_file_private prime;
> >>>>>>    +    struct page *dummy_page;
> >>>>>> +
> >>>>>>        /* private: */
> >>>>>>    #if IS_ENABLED(CONFIG_DRM_LEGACY)
> >>>>>>        unsigned long lock_count; /* DRI1 legacy lock count */
> >>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> >>>>>> index 0b37506..47460d1 100644
> >>>>>> --- a/include/drm/drm_gem.h
> >>>>>> +++ b/include/drm/drm_gem.h
> >>>>>> @@ -310,6 +310,8 @@ struct drm_gem_object {
> >>>>>>         *
> >>>>>>         */
> >>>>>>        const struct drm_gem_object_funcs *funcs;
> >>>>>> +
> >>>>>> +    struct page *dummy_page;
> >>>>>>    };
> >>>>>>      /**
> >>
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-14  9:51               ` Daniel Vetter
@ 2020-11-14  9:57                 ` Daniel Vetter
  2020-11-16  9:42                   ` Michel Dänzer
  2020-11-15  6:34                 ` Andrey Grodzovsky
  1 sibling, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-11-14  9:57 UTC (permalink / raw)
  To: Christian König
  Cc: Andrey Grodzovsky, Michel Dänzer, amd-gfx list,
	Pekka Paalanen, dri-devel, Alex Deucher

On Sat, Nov 14, 2020 at 10:51 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Sat, Nov 14, 2020 at 9:41 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
> > >
> > > On 6/22/20 1:50 PM, Daniel Vetter wrote:
> > >> On Mon, Jun 22, 2020 at 7:45 PM Christian König
> > >> <christian.koenig@amd.com> wrote:
> > >>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
> > >>>> On 6/22/20 9:18 AM, Christian König wrote:
> > >>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
> > >>>>>> Will be used to reroute CPU mapped BO's page faults once
> > >>>>>> device is removed.
> > >>>>>>
> > >>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > >>>>>> ---
> > >>>>>>    drivers/gpu/drm/drm_file.c  |  8 ++++++++
> > >>>>>>    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> > >>>>>>    include/drm/drm_file.h      |  2 ++
> > >>>>>>    include/drm/drm_gem.h       |  2 ++
> > >>>>>>    4 files changed, 22 insertions(+)
> > >>>>>>
> > >>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > >>>>>> index c4c704e..67c0770 100644
> > >>>>>> --- a/drivers/gpu/drm/drm_file.c
> > >>>>>> +++ b/drivers/gpu/drm/drm_file.c
> > >>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
> > >>>>>> drm_minor *minor)
> > >>>>>>                goto out_prime_destroy;
> > >>>>>>        }
> > >>>>>>    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > >>>>>> +    if (!file->dummy_page) {
> > >>>>>> +        ret = -ENOMEM;
> > >>>>>> +        goto out_prime_destroy;
> > >>>>>> +    }
> > >>>>>> +
> > >>>>>>        return file;
> > >>>>>>      out_prime_destroy:
> > >>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
> > >>>>>>        if (dev->driver->postclose)
> > >>>>>>            dev->driver->postclose(dev, file);
> > >>>>>>    +    __free_page(file->dummy_page);
> > >>>>>> +
> > >>>>>> drm_prime_destroy_file_private(&file->prime);
> > >>>>>>          WARN_ON(!list_empty(&file->event_list));
> > >>>>>> diff --git a/drivers/gpu/drm/drm_prime.c
> > >>>>>> b/drivers/gpu/drm/drm_prime.c
> > >>>>>> index 1de2cde..c482e9c 100644
> > >>>>>> --- a/drivers/gpu/drm/drm_prime.c
> > >>>>>> +++ b/drivers/gpu/drm/drm_prime.c
> > >>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
> > >>>>>> drm_device *dev,
> > >>>>>>          ret = drm_prime_add_buf_handle(&file_priv->prime,
> > >>>>>>                dma_buf, *handle);
> > >>>>>> +
> > >>>>>> +    if (!ret) {
> > >>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > >>>>>> +        if (!obj->dummy_page)
> > >>>>>> +            ret = -ENOMEM;
> > >>>>>> +    }
> > >>>>>> +
> > >>>>> While the per file case still looks acceptable this is a clear NAK
> > >>>>> since it will massively increase the memory needed for a prime
> > >>>>> exported object.
> > >>>>>
> > >>>>> I think that this is quite overkill in the first place and for the
> > >>>>> hot unplug case we can just use the global dummy page as well.
> > >>>>>
> > >>>>> Christian.
> > >>>>
> > >>>> Global dummy page is good for read access, what do you do on write
> > >>>> access ? My first approach was indeed to map at first global dummy
> > >>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
> > >>>> that this would trigger Copy On Write flow in core mm
> > >>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=02%7C01%7CAndrey.Grodzovsky%40amd.com%7C3753451d037544e7495408d816d4c4ee%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637284450384586120&amp;sdata=ZpRaQgqA5K4jRfidOiedey0AleeYQ97WNUkGA29ERA0%3D&amp;reserved=0)
> > >>>>
> > >>>> on the next page fault to same address triggered by a write access but
> > >>>> then i realized a new COW page will be allocated for each such mapping
> > >>>> and this is much more wasteful then having a dedicated page per GEM
> > >>>> object.
> > >>> Yeah, but this is only for a very very small corner cases. What we need
> > >>> to prevent is increasing the memory usage during normal operation to
> > >>> much.
> > >>>
> > >>> Using memory during the unplug is completely unproblematic because we
> > >>> just released quite a bunch of it by releasing all those system memory
> > >>> buffers.
> > >>>
> > >>> And I'm pretty sure that COWed pages are correctly accounted towards
> > >>> the
> > >>> used memory of a process.
> > >>>
> > >>> So I think if that approach works as intended and the COW pages are
> > >>> released again on unmapping it would be the perfect solution to the
> > >>> problem.
> > >>>
> > >>> Daniel what do you think?
> > >> If COW works, sure sounds reasonable. And if we can make sure we
> > >> managed to drop all the system allocations (otherwise suddenly 2x
> > >> memory usage, worst case). But I have no idea whether we can
> > >> retroshoehorn that into an established vma, you might have fun stuff
> > >> like a mkwrite handler there (which I thought is the COW handler
> > >> thing, but really no idea).
> > >>
> > >> If we need to massively change stuff then I think rw dummy page,
> > >> allocated on first fault after hotunplug (maybe just make it one per
> > >> object, that's simplest) seems like the much safer option. Much less
> > >> code that can go wrong.
> > >> -Daniel
> > >
> > >
> > > Regarding COW, i was looking into how to properly implement it from
> > > within the fault handler (i.e. ttm_bo_vm_fault)
> > > and the main obstacle I hit is that of exclusive access to the
> > > vm_area_struct, i need to be able to modify
> > > vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can
> > > be triggered on subsequent write access
> > > fault (here
> > > https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128)
> > > but core mm takes only read side mm_sem (here for example
> > > https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#L488)
> > > and so I am not supposed to modify vm_area_struct in this case. I am
> > > not sure if it's legit to write lock tthe mm_sem from this point.
> > > I found some discussions about this here
> > > http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it
> > > wasn't really clear to me
> > > what's the solution.
> > >
> > > In any case, seems to me that easier and more memory saving solution
> > > would be to just switch to per ttm bo dumy rw page that
> > > would be allocated on demand as you suggested here.  This should also
> > > take care of imported BOs and flink cases.
> > > Then i can drop the per device FD and per GEM object FD dummy BO and
> > > the ugly loop i am using in patch 2 to match faulting BO to the right
> > > dummy page.
> > >
> > > Does this makes sense ?
> >
> > I still don't see the information leak as much of a problem, but if
> > Daniel insists we should probably do this.
>
> Well amdgpu doesn't clear buffers by default, so indeed you guys are a
> lot more laissez-faire here. But in general we really don't do that
> kind of leaking. Iirc there's even radeonsi bugs because else clears,
> and radeonsi happily displays gunk :-)

btw I think not clearing at alloc breaks the render node model a bit.
Without that this was all fine, since system pages still got cleared
by alloc_page(), and we only leaked vram. And for the legacy node
model with authentication of clients against the X server, leaking
that all around was ok. With render nodes no leaking should happen,
with no knob for userspace to opt out of the forced clearing.
-Daniel

> > But could we at least have only one page per client instead of per BO?
>
> I think you can do one page per file descriptor or something like
> that. But gets annoying with shared bo, especially with dma_buf_mmap
> forwarding.
> -Daniel
>
> >
> > Thanks,
> > Christian.
> >
> > >
> > > Andrey
> > >
> > >
> > >>
> > >>> Regards,
> > >>> Christian.
> > >>>
> > >>>> We can indeed optimize by allocating this dummy page on the first page
> > >>>> fault after device disconnect instead on GEM object creation.
> > >>>>
> > >>>> Andrey
> > >>>>
> > >>>>
> > >>>>>> mutex_unlock(&file_priv->prime.lock);
> > >>>>>>        if (ret)
> > >>>>>>            goto fail;
> > >>>>>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct
> > >>>>>> drm_gem_object *obj, struct sg_table *sg)
> > >>>>>>            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
> > >>>>>>        dma_buf = attach->dmabuf;
> > >>>>>>        dma_buf_detach(attach->dmabuf, attach);
> > >>>>>> +
> > >>>>>> +    __free_page(obj->dummy_page);
> > >>>>>> +
> > >>>>>>        /* remove the reference */
> > >>>>>>        dma_buf_put(dma_buf);
> > >>>>>>    }
> > >>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > >>>>>> index 19df802..349a658 100644
> > >>>>>> --- a/include/drm/drm_file.h
> > >>>>>> +++ b/include/drm/drm_file.h
> > >>>>>> @@ -335,6 +335,8 @@ struct drm_file {
> > >>>>>>         */
> > >>>>>>        struct drm_prime_file_private prime;
> > >>>>>>    +    struct page *dummy_page;
> > >>>>>> +
> > >>>>>>        /* private: */
> > >>>>>>    #if IS_ENABLED(CONFIG_DRM_LEGACY)
> > >>>>>>        unsigned long lock_count; /* DRI1 legacy lock count */
> > >>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > >>>>>> index 0b37506..47460d1 100644
> > >>>>>> --- a/include/drm/drm_gem.h
> > >>>>>> +++ b/include/drm/drm_gem.h
> > >>>>>> @@ -310,6 +310,8 @@ struct drm_gem_object {
> > >>>>>>         *
> > >>>>>>         */
> > >>>>>>        const struct drm_gem_object_funcs *funcs;
> > >>>>>> +
> > >>>>>> +    struct page *dummy_page;
> > >>>>>>    };
> > >>>>>>      /**
> > >>
> > > _______________________________________________
> > > amd-gfx mailing list
> > > amd-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-14  9:51               ` Daniel Vetter
  2020-11-14  9:57                 ` Daniel Vetter
@ 2020-11-15  6:34                 ` Andrey Grodzovsky
  2020-11-16  9:48                   ` Christian König
  1 sibling, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-15  6:34 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Alex Deucher, Michel Dänzer, amd-gfx list, Pekka Paalanen,
	dri-devel


On 11/14/20 4:51 AM, Daniel Vetter wrote:
> On Sat, Nov 14, 2020 at 9:41 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
>>> On 6/22/20 1:50 PM, Daniel Vetter wrote:
>>>> On Mon, Jun 22, 2020 at 7:45 PM Christian König
>>>> <christian.koenig@amd.com> wrote:
>>>>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>>>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>> device is removed.
>>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>>>> index c4c704e..67c0770 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>>>>> drm_minor *minor)
>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>         }
>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>> +        ret = -ENOMEM;
>>>>>>>> +        goto out_prime_destroy;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         return file;
>>>>>>>>       out_prime_destroy:
>>>>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>             dev->driver->postclose(dev, file);
>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>> +
>>>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>>>           WARN_ON(!list_empty(&file->event_list));
>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c
>>>>>>>> b/drivers/gpu/drm/drm_prime.c
>>>>>>>> index 1de2cde..c482e9c 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>>>>> drm_device *dev,
>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>                 dma_buf, *handle);
>>>>>>>> +
>>>>>>>> +    if (!ret) {
>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>> +            ret = -ENOMEM;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>> While the per file case still looks acceptable this is a clear NAK
>>>>>>> since it will massively increase the memory needed for a prime
>>>>>>> exported object.
>>>>>>>
>>>>>>> I think that this is quite overkill in the first place and for the
>>>>>>> hot unplug case we can just use the global dummy page as well.
>>>>>>>
>>>>>>> Christian.
>>>>>> Global dummy page is good for read access, what do you do on write
>>>>>> access ? My first approach was indeed to map at first global dummy
>>>>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
>>>>>> that this would trigger Copy On Write flow in core mm
>>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kghiG3VpCJod6YefExoDVPl9X03zNhw3SN5GAxgbnmU%3D&amp;reserved=0)
>>>>>>
>>>>>> on the next page fault to same address triggered by a write access but
>>>>>> then i realized a new COW page will be allocated for each such mapping
>>>>>> and this is much more wasteful then having a dedicated page per GEM
>>>>>> object.
>>>>> Yeah, but this is only for a very very small corner cases. What we need
>>>>> to prevent is increasing the memory usage during normal operation to
>>>>> much.
>>>>>
>>>>> Using memory during the unplug is completely unproblematic because we
>>>>> just released quite a bunch of it by releasing all those system memory
>>>>> buffers.
>>>>>
>>>>> And I'm pretty sure that COWed pages are correctly accounted towards
>>>>> the
>>>>> used memory of a process.
>>>>>
>>>>> So I think if that approach works as intended and the COW pages are
>>>>> released again on unmapping it would be the perfect solution to the
>>>>> problem.
>>>>>
>>>>> Daniel what do you think?
>>>> If COW works, sure sounds reasonable. And if we can make sure we
>>>> managed to drop all the system allocations (otherwise suddenly 2x
>>>> memory usage, worst case). But I have no idea whether we can
>>>> retroshoehorn that into an established vma, you might have fun stuff
>>>> like a mkwrite handler there (which I thought is the COW handler
>>>> thing, but really no idea).
>>>>
>>>> If we need to massively change stuff then I think rw dummy page,
>>>> allocated on first fault after hotunplug (maybe just make it one per
>>>> object, that's simplest) seems like the much safer option. Much less
>>>> code that can go wrong.
>>>> -Daniel
>>>
>>> Regarding COW, i was looking into how to properly implement it from
>>> within the fault handler (i.e. ttm_bo_vm_fault)
>>> and the main obstacle I hit is that of exclusive access to the
>>> vm_area_struct, i need to be able to modify
>>> vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can
>>> be triggered on subsequent write access
>>> fault (here
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fmm%2Fmemory.c%23L4128&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ziHJtqyHuLrlb0uYKhoWCWhUAZnX0JquE%2BkBJ5Fx%2BNo%3D&amp;reserved=0)
>>> but core mm takes only read side mm_sem (here for example
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fiommu%2Famd%2Fiommu_v2.c%23L488&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h360c75Upl3%2FW7im7M1%2BxY%2FXy4gxin%2BkCF1Ui2zFXMs%3D&amp;reserved=0)
>>> and so I am not supposed to modify vm_area_struct in this case. I am
>>> not sure if it's legit to write lock tthe mm_sem from this point.
>>> I found some discussions about this here
>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu%2Fhypermail%2Flinux%2Fkernel%2F1909.1%2F02754.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224021379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sx6s1lH%2FvxbIZajc4Yr49vFhxvPEnBHZlTt52D8qvZA%3D&amp;reserved=0 but it
>>> wasn't really clear to me
>>> what's the solution.
>>>
>>> In any case, seems to me that easier and more memory saving solution
>>> would be to just switch to per ttm bo dumy rw page that
>>> would be allocated on demand as you suggested here.  This should also
>>> take care of imported BOs and flink cases.
>>> Then i can drop the per device FD and per GEM object FD dummy BO and
>>> the ugly loop i am using in patch 2 to match faulting BO to the right
>>> dummy page.
>>>
>>> Does this makes sense ?
>> I still don't see the information leak as much of a problem, but if
>> Daniel insists we should probably do this.
> Well amdgpu doesn't clear buffers by default, so indeed you guys are a
> lot more laissez-faire here. But in general we really don't do that
> kind of leaking. Iirc there's even radeonsi bugs because else clears,
> and radeonsi happily displays gunk :-)
>
>> But could we at least have only one page per client instead of per BO?
> I think you can do one page per file descriptor or something like
> that. But gets annoying with shared bo, especially with dma_buf_mmap
> forwarding.
> -Daniel


Christian - is your concern more with too much page allocations or with extra 
pointer member
cluttering TTM BO struct ? Because we can allocate the dummy page on demand only 
when
needed. It's just seems to me that keeping it per BO streamlines the code as I 
don't need to
have different handling for local vs imported BOs.

Andrey


>
>> Thanks,
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> We can indeed optimize by allocating this dummy page on the first page
>>>>>> fault after device disconnect instead on GEM object creation.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>> mutex_unlock(&file_priv->prime.lock);
>>>>>>>>         if (ret)
>>>>>>>>             goto fail;
>>>>>>>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct
>>>>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>>>>             dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>>>>         dma_buf = attach->dmabuf;
>>>>>>>>         dma_buf_detach(attach->dmabuf, attach);
>>>>>>>> +
>>>>>>>> +    __free_page(obj->dummy_page);
>>>>>>>> +
>>>>>>>>         /* remove the reference */
>>>>>>>>         dma_buf_put(dma_buf);
>>>>>>>>     }
>>>>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>>>>> index 19df802..349a658 100644
>>>>>>>> --- a/include/drm/drm_file.h
>>>>>>>> +++ b/include/drm/drm_file.h
>>>>>>>> @@ -335,6 +335,8 @@ struct drm_file {
>>>>>>>>          */
>>>>>>>>         struct drm_prime_file_private prime;
>>>>>>>>     +    struct page *dummy_page;
>>>>>>>> +
>>>>>>>>         /* private: */
>>>>>>>>     #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>>>>         unsigned long lock_count; /* DRI1 legacy lock count */
>>>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>>>>> index 0b37506..47460d1 100644
>>>>>>>> --- a/include/drm/drm_gem.h
>>>>>>>> +++ b/include/drm/drm_gem.h
>>>>>>>> @@ -310,6 +310,8 @@ struct drm_gem_object {
>>>>>>>>          *
>>>>>>>>          */
>>>>>>>>         const struct drm_gem_object_funcs *funcs;
>>>>>>>> +
>>>>>>>> +    struct page *dummy_page;
>>>>>>>>     };
>>>>>>>>       /**
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224021379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WrsqkL%2BCnsZeANUqa88g3wyIMHuRjkKGFgf4RYNFAyw%3D&amp;reserved=0
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-06-22  9:35   ` Daniel Vetter
  2020-06-22 14:21     ` Pekka Paalanen
  2020-11-09 20:34     ` Andrey Grodzovsky
@ 2020-11-15  6:39     ` Andrey Grodzovsky
  2 siblings, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-15  6:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


On 6/22/20 5:35 AM, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 02:03:01AM -0400, Andrey Grodzovsky wrote:
>> Will be used to reroute CPU mapped BO's page faults once
>> device is removed.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>   include/drm/drm_file.h      |  2 ++
>>   include/drm/drm_gem.h       |  2 ++
>>   4 files changed, 22 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index c4c704e..67c0770 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>   			goto out_prime_destroy;
>>   	}
>>   
>> +	file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +	if (!file->dummy_page) {
>> +		ret = -ENOMEM;
>> +		goto out_prime_destroy;
>> +	}
>> +
>>   	return file;
>>   
>>   out_prime_destroy:
>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>   	if (dev->driver->postclose)
>>   		dev->driver->postclose(dev, file);
>>   
>> +	__free_page(file->dummy_page);
>> +
>>   	drm_prime_destroy_file_private(&file->prime);
>>   
>>   	WARN_ON(!list_empty(&file->event_list));
>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>> index 1de2cde..c482e9c 100644
>> --- a/drivers/gpu/drm/drm_prime.c
>> +++ b/drivers/gpu/drm/drm_prime.c
>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>   
>>   	ret = drm_prime_add_buf_handle(&file_priv->prime,
>>   			dma_buf, *handle);
>> +
>> +	if (!ret) {
>> +		obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +		if (!obj->dummy_page)
>> +			ret = -ENOMEM;
>> +	}
>> +
>>   	mutex_unlock(&file_priv->prime.lock);
>>   	if (ret)
>>   		goto fail;
>> @@ -1006,6 +1013,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
>>   		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>   	dma_buf = attach->dmabuf;
>>   	dma_buf_detach(attach->dmabuf, attach);
>> +
>> +	__free_page(obj->dummy_page);
>> +
>>   	/* remove the reference */
>>   	dma_buf_put(dma_buf);
>>   }
>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>> index 19df802..349a658 100644
>> --- a/include/drm/drm_file.h
>> +++ b/include/drm/drm_file.h
>> @@ -335,6 +335,8 @@ struct drm_file {
>>   	 */
>>   	struct drm_prime_file_private prime;
>>   
> Kerneldoc for these please, including why we need them and when. E.g. the
> one in gem_bo should say it's only for exported buffers, so that we're not
> colliding security spaces.
>
>> +	struct page *dummy_page;
>> +
>>   	/* private: */
>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>   	unsigned long lock_count; /* DRI1 legacy lock count */
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index 0b37506..47460d1 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -310,6 +310,8 @@ struct drm_gem_object {
>>   	 *
>>   	 */
>>   	const struct drm_gem_object_funcs *funcs;
>> +
>> +	struct page *dummy_page;
>>   };
> I think amdgpu doesn't care, but everyone else still might care somewhat
> about flink. That also shares buffers, so also needs to allocate the
> per-bo dummy page.


Not familiar with FLINK so I read a bit here https://lwn.net/Articles/283798/
sections 3 and 4 about FLINK naming and later mapping, I don't see a difference
between FLINK and local BO mapping as opening by FLINK name returns handle
to the same BO as the original. Why then we need a special handling for FLINK ?

Andrey


>
> I also wonder whether we shouldn't have a helper to look up the dummy
> page, just to encode in core code how it's supposedo to cascade.
> -Daniel
>
>>   
>>   /**
>> -- 
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-14  9:57                 ` Daniel Vetter
@ 2020-11-16  9:42                   ` Michel Dänzer
  0 siblings, 0 replies; 97+ messages in thread
From: Michel Dänzer @ 2020-11-16  9:42 UTC (permalink / raw)
  To: Daniel Vetter, Christian König; +Cc: dri-devel, amd-gfx list

On 2020-11-14 10:57 a.m., Daniel Vetter wrote:
> On Sat, Nov 14, 2020 at 10:51 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>
>> On Sat, Nov 14, 2020 at 9:41 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>
>>> Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
>>>>
>>>> On 6/22/20 1:50 PM, Daniel Vetter wrote:
>>>>> On Mon, Jun 22, 2020 at 7:45 PM Christian König
>>>>> <christian.koenig@amd.com> wrote:
>>>>>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>>>>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>>>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>> device is removed.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>>>>> index c4c704e..67c0770 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>>>>>> drm_minor *minor)
>>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>>         }
>>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>>> +        ret = -ENOMEM;
>>>>>>>>> +        goto out_prime_destroy;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>>         return file;
>>>>>>>>>       out_prime_destroy:
>>>>>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>>             dev->driver->postclose(dev, file);
>>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>>> +
>>>>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>>>>           WARN_ON(!list_empty(&file->event_list));
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c
>>>>>>>>> b/drivers/gpu/drm/drm_prime.c
>>>>>>>>> index 1de2cde..c482e9c 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>>>>>> drm_device *dev,
>>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>>                 dma_buf, *handle);
>>>>>>>>> +
>>>>>>>>> +    if (!ret) {
>>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>>> +            ret = -ENOMEM;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>> While the per file case still looks acceptable this is a clear NAK
>>>>>>>> since it will massively increase the memory needed for a prime
>>>>>>>> exported object.
>>>>>>>>
>>>>>>>> I think that this is quite overkill in the first place and for the
>>>>>>>> hot unplug case we can just use the global dummy page as well.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>
>>>>>>> Global dummy page is good for read access, what do you do on write
>>>>>>> access ? My first approach was indeed to map at first global dummy
>>>>>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
>>>>>>> that this would trigger Copy On Write flow in core mm
>>>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=02%7C01%7CAndrey.Grodzovsky%40amd.com%7C3753451d037544e7495408d816d4c4ee%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637284450384586120&amp;sdata=ZpRaQgqA5K4jRfidOiedey0AleeYQ97WNUkGA29ERA0%3D&amp;reserved=0)
>>>>>>>
>>>>>>> on the next page fault to same address triggered by a write access but
>>>>>>> then i realized a new COW page will be allocated for each such mapping
>>>>>>> and this is much more wasteful then having a dedicated page per GEM
>>>>>>> object.
>>>>>> Yeah, but this is only for a very very small corner cases. What we need
>>>>>> to prevent is increasing the memory usage during normal operation to
>>>>>> much.
>>>>>>
>>>>>> Using memory during the unplug is completely unproblematic because we
>>>>>> just released quite a bunch of it by releasing all those system memory
>>>>>> buffers.
>>>>>>
>>>>>> And I'm pretty sure that COWed pages are correctly accounted towards
>>>>>> the
>>>>>> used memory of a process.
>>>>>>
>>>>>> So I think if that approach works as intended and the COW pages are
>>>>>> released again on unmapping it would be the perfect solution to the
>>>>>> problem.
>>>>>>
>>>>>> Daniel what do you think?
>>>>> If COW works, sure sounds reasonable. And if we can make sure we
>>>>> managed to drop all the system allocations (otherwise suddenly 2x
>>>>> memory usage, worst case). But I have no idea whether we can
>>>>> retroshoehorn that into an established vma, you might have fun stuff
>>>>> like a mkwrite handler there (which I thought is the COW handler
>>>>> thing, but really no idea).
>>>>>
>>>>> If we need to massively change stuff then I think rw dummy page,
>>>>> allocated on first fault after hotunplug (maybe just make it one per
>>>>> object, that's simplest) seems like the much safer option. Much less
>>>>> code that can go wrong.
>>>>> -Daniel
>>>>
>>>>
>>>> Regarding COW, i was looking into how to properly implement it from
>>>> within the fault handler (i.e. ttm_bo_vm_fault)
>>>> and the main obstacle I hit is that of exclusive access to the
>>>> vm_area_struct, i need to be able to modify
>>>> vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can
>>>> be triggered on subsequent write access
>>>> fault (here
>>>> https://elixir.bootlin.com/linux/latest/source/mm/memory.c#L4128)
>>>> but core mm takes only read side mm_sem (here for example
>>>> https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd/iommu_v2.c#L488)
>>>> and so I am not supposed to modify vm_area_struct in this case. I am
>>>> not sure if it's legit to write lock tthe mm_sem from this point.
>>>> I found some discussions about this here
>>>> http://lkml.iu.edu/hypermail/linux/kernel/1909.1/02754.html but it
>>>> wasn't really clear to me
>>>> what's the solution.
>>>>
>>>> In any case, seems to me that easier and more memory saving solution
>>>> would be to just switch to per ttm bo dumy rw page that
>>>> would be allocated on demand as you suggested here.  This should also
>>>> take care of imported BOs and flink cases.
>>>> Then i can drop the per device FD and per GEM object FD dummy BO and
>>>> the ugly loop i am using in patch 2 to match faulting BO to the right
>>>> dummy page.
>>>>
>>>> Does this makes sense ?
>>>
>>> I still don't see the information leak as much of a problem, but if
>>> Daniel insists we should probably do this.
>>
>> Well amdgpu doesn't clear buffers by default, so indeed you guys are a
>> lot more laissez-faire here. But in general we really don't do that
>> kind of leaking. Iirc there's even radeonsi bugs because else clears,
>> and radeonsi happily displays gunk :-)
> 
> btw I think not clearing at alloc breaks the render node model a bit.
> Without that this was all fine, since system pages still got cleared
> by alloc_page(), and we only leaked vram. And for the legacy node
> model with authentication of clients against the X server, leaking
> that all around was ok. With render nodes no leaking should happen,
> with no knob for userspace to opt out of the forced clearing.

Seconded.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-15  6:34                 ` Andrey Grodzovsky
@ 2020-11-16  9:48                   ` Christian König
  2020-11-16 19:00                     ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Christian König @ 2020-11-16  9:48 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: Alex Deucher, Michel Dänzer, amd-gfx list, Pekka Paalanen,
	dri-devel

Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
>
> On 11/14/20 4:51 AM, Daniel Vetter wrote:
>> On Sat, Nov 14, 2020 at 9:41 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
>>>> On 6/22/20 1:50 PM, Daniel Vetter wrote:
>>>>> On Mon, Jun 22, 2020 at 7:45 PM Christian König
>>>>> <christian.koenig@amd.com> wrote:
>>>>>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>>>>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>>>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>> device is removed.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c 
>>>>>>>>> b/drivers/gpu/drm/drm_file.c
>>>>>>>>> index c4c704e..67c0770 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>>>>>> drm_minor *minor)
>>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>>         }
>>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>>> +        ret = -ENOMEM;
>>>>>>>>> +        goto out_prime_destroy;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>>         return file;
>>>>>>>>>       out_prime_destroy:
>>>>>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>>             dev->driver->postclose(dev, file);
>>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>>> +
>>>>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>>>> WARN_ON(!list_empty(&file->event_list));
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c
>>>>>>>>> b/drivers/gpu/drm/drm_prime.c
>>>>>>>>> index 1de2cde..c482e9c 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>>>>>> drm_device *dev,
>>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>>                 dma_buf, *handle);
>>>>>>>>> +
>>>>>>>>> +    if (!ret) {
>>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>>> +            ret = -ENOMEM;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>> While the per file case still looks acceptable this is a clear NAK
>>>>>>>> since it will massively increase the memory needed for a prime
>>>>>>>> exported object.
>>>>>>>>
>>>>>>>> I think that this is quite overkill in the first place and for the
>>>>>>>> hot unplug case we can just use the global dummy page as well.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>> Global dummy page is good for read access, what do you do on write
>>>>>>> access ? My first approach was indeed to map at first global dummy
>>>>>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
>>>>>>> that this would trigger Copy On Write flow in core mm
>>>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kghiG3VpCJod6YefExoDVPl9X03zNhw3SN5GAxgbnmU%3D&amp;reserved=0) 
>>>>>>>
>>>>>>>
>>>>>>> on the next page fault to same address triggered by a write 
>>>>>>> access but
>>>>>>> then i realized a new COW page will be allocated for each such 
>>>>>>> mapping
>>>>>>> and this is much more wasteful then having a dedicated page per GEM
>>>>>>> object.
>>>>>> Yeah, but this is only for a very very small corner cases. What 
>>>>>> we need
>>>>>> to prevent is increasing the memory usage during normal operation to
>>>>>> much.
>>>>>>
>>>>>> Using memory during the unplug is completely unproblematic 
>>>>>> because we
>>>>>> just released quite a bunch of it by releasing all those system 
>>>>>> memory
>>>>>> buffers.
>>>>>>
>>>>>> And I'm pretty sure that COWed pages are correctly accounted towards
>>>>>> the
>>>>>> used memory of a process.
>>>>>>
>>>>>> So I think if that approach works as intended and the COW pages are
>>>>>> released again on unmapping it would be the perfect solution to the
>>>>>> problem.
>>>>>>
>>>>>> Daniel what do you think?
>>>>> If COW works, sure sounds reasonable. And if we can make sure we
>>>>> managed to drop all the system allocations (otherwise suddenly 2x
>>>>> memory usage, worst case). But I have no idea whether we can
>>>>> retroshoehorn that into an established vma, you might have fun stuff
>>>>> like a mkwrite handler there (which I thought is the COW handler
>>>>> thing, but really no idea).
>>>>>
>>>>> If we need to massively change stuff then I think rw dummy page,
>>>>> allocated on first fault after hotunplug (maybe just make it one per
>>>>> object, that's simplest) seems like the much safer option. Much less
>>>>> code that can go wrong.
>>>>> -Daniel
>>>>
>>>> Regarding COW, i was looking into how to properly implement it from
>>>> within the fault handler (i.e. ttm_bo_vm_fault)
>>>> and the main obstacle I hit is that of exclusive access to the
>>>> vm_area_struct, i need to be able to modify
>>>> vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can
>>>> be triggered on subsequent write access
>>>> fault (here
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fmm%2Fmemory.c%23L4128&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ziHJtqyHuLrlb0uYKhoWCWhUAZnX0JquE%2BkBJ5Fx%2BNo%3D&amp;reserved=0) 
>>>>
>>>> but core mm takes only read side mm_sem (here for example
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fiommu%2Famd%2Fiommu_v2.c%23L488&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h360c75Upl3%2FW7im7M1%2BxY%2FXy4gxin%2BkCF1Ui2zFXMs%3D&amp;reserved=0) 
>>>>
>>>> and so I am not supposed to modify vm_area_struct in this case. I am
>>>> not sure if it's legit to write lock tthe mm_sem from this point.
>>>> I found some discussions about this here
>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu%2Fhypermail%2Flinux%2Fkernel%2F1909.1%2F02754.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224021379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sx6s1lH%2FvxbIZajc4Yr49vFhxvPEnBHZlTt52D8qvZA%3D&amp;reserved=0 
>>>> but it
>>>> wasn't really clear to me
>>>> what's the solution.
>>>>
>>>> In any case, seems to me that easier and more memory saving solution
>>>> would be to just switch to per ttm bo dumy rw page that
>>>> would be allocated on demand as you suggested here.  This should also
>>>> take care of imported BOs and flink cases.
>>>> Then i can drop the per device FD and per GEM object FD dummy BO and
>>>> the ugly loop i am using in patch 2 to match faulting BO to the right
>>>> dummy page.
>>>>
>>>> Does this makes sense ?
>>> I still don't see the information leak as much of a problem, but if
>>> Daniel insists we should probably do this.
>> Well amdgpu doesn't clear buffers by default, so indeed you guys are a
>> lot more laissez-faire here. But in general we really don't do that
>> kind of leaking. Iirc there's even radeonsi bugs because else clears,
>> and radeonsi happily displays gunk :-)
>>
>>> But could we at least have only one page per client instead of per BO?
>> I think you can do one page per file descriptor or something like
>> that. But gets annoying with shared bo, especially with dma_buf_mmap
>> forwarding.
>> -Daniel
>
>
> Christian - is your concern more with too much page allocations or 
> with extra pointer member
> cluttering TTM BO struct ?

Yes, that is one problem.

> Because we can allocate the dummy page on demand only when
> needed. It's just seems to me that keeping it per BO streamlines the 
> code as I don't need to
> have different handling for local vs imported BOs.

Why should you have a difference between local vs imported BOs?

Christian.

>
> Andrey

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-16  9:48                   ` Christian König
@ 2020-11-16 19:00                     ` Andrey Grodzovsky
  2020-11-16 20:36                       ` Christian König
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-16 19:00 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: Alex Deucher, Michel Dänzer, amd-gfx list, Pekka Paalanen,
	dri-devel


On 11/16/20 4:48 AM, Christian König wrote:
> Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
>>
>> On 11/14/20 4:51 AM, Daniel Vetter wrote:
>>> On Sat, Nov 14, 2020 at 9:41 AM Christian König
>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>> Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
>>>>> On 6/22/20 1:50 PM, Daniel Vetter wrote:
>>>>>> On Mon, Jun 22, 2020 at 7:45 PM Christian König
>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>>>>>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>>>>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>> device is removed.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>> ---
>>>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>>>>>> index c4c704e..67c0770 100644
>>>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>>>>>>> drm_minor *minor)
>>>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>>>         }
>>>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>>>> +        ret = -ENOMEM;
>>>>>>>>>> +        goto out_prime_destroy;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>>         return file;
>>>>>>>>>>       out_prime_destroy:
>>>>>>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>>>             dev->driver->postclose(dev, file);
>>>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>>>> +
>>>>>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>>>>> WARN_ON(!list_empty(&file->event_list));
>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c
>>>>>>>>>> b/drivers/gpu/drm/drm_prime.c
>>>>>>>>>> index 1de2cde..c482e9c 100644
>>>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>>>>>>> drm_device *dev,
>>>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>>>                 dma_buf, *handle);
>>>>>>>>>> +
>>>>>>>>>> +    if (!ret) {
>>>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>>>> +            ret = -ENOMEM;
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>> While the per file case still looks acceptable this is a clear NAK
>>>>>>>>> since it will massively increase the memory needed for a prime
>>>>>>>>> exported object.
>>>>>>>>>
>>>>>>>>> I think that this is quite overkill in the first place and for the
>>>>>>>>> hot unplug case we can just use the global dummy page as well.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>> Global dummy page is good for read access, what do you do on write
>>>>>>>> access ? My first approach was indeed to map at first global dummy
>>>>>>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
>>>>>>>> that this would trigger Copy On Write flow in core mm
>>>>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kghiG3VpCJod6YefExoDVPl9X03zNhw3SN5GAxgbnmU%3D&amp;reserved=0) 
>>>>>>>>
>>>>>>>>
>>>>>>>> on the next page fault to same address triggered by a write access but
>>>>>>>> then i realized a new COW page will be allocated for each such mapping
>>>>>>>> and this is much more wasteful then having a dedicated page per GEM
>>>>>>>> object.
>>>>>>> Yeah, but this is only for a very very small corner cases. What we need
>>>>>>> to prevent is increasing the memory usage during normal operation to
>>>>>>> much.
>>>>>>>
>>>>>>> Using memory during the unplug is completely unproblematic because we
>>>>>>> just released quite a bunch of it by releasing all those system memory
>>>>>>> buffers.
>>>>>>>
>>>>>>> And I'm pretty sure that COWed pages are correctly accounted towards
>>>>>>> the
>>>>>>> used memory of a process.
>>>>>>>
>>>>>>> So I think if that approach works as intended and the COW pages are
>>>>>>> released again on unmapping it would be the perfect solution to the
>>>>>>> problem.
>>>>>>>
>>>>>>> Daniel what do you think?
>>>>>> If COW works, sure sounds reasonable. And if we can make sure we
>>>>>> managed to drop all the system allocations (otherwise suddenly 2x
>>>>>> memory usage, worst case). But I have no idea whether we can
>>>>>> retroshoehorn that into an established vma, you might have fun stuff
>>>>>> like a mkwrite handler there (which I thought is the COW handler
>>>>>> thing, but really no idea).
>>>>>>
>>>>>> If we need to massively change stuff then I think rw dummy page,
>>>>>> allocated on first fault after hotunplug (maybe just make it one per
>>>>>> object, that's simplest) seems like the much safer option. Much less
>>>>>> code that can go wrong.
>>>>>> -Daniel
>>>>>
>>>>> Regarding COW, i was looking into how to properly implement it from
>>>>> within the fault handler (i.e. ttm_bo_vm_fault)
>>>>> and the main obstacle I hit is that of exclusive access to the
>>>>> vm_area_struct, i need to be able to modify
>>>>> vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can
>>>>> be triggered on subsequent write access
>>>>> fault (here
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fmm%2Fmemory.c%23L4128&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ziHJtqyHuLrlb0uYKhoWCWhUAZnX0JquE%2BkBJ5Fx%2BNo%3D&amp;reserved=0) 
>>>>>
>>>>> but core mm takes only read side mm_sem (here for example
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fiommu%2Famd%2Fiommu_v2.c%23L488&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h360c75Upl3%2FW7im7M1%2BxY%2FXy4gxin%2BkCF1Ui2zFXMs%3D&amp;reserved=0) 
>>>>>
>>>>> and so I am not supposed to modify vm_area_struct in this case. I am
>>>>> not sure if it's legit to write lock tthe mm_sem from this point.
>>>>> I found some discussions about this here
>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu%2Fhypermail%2Flinux%2Fkernel%2F1909.1%2F02754.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224021379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sx6s1lH%2FvxbIZajc4Yr49vFhxvPEnBHZlTt52D8qvZA%3D&amp;reserved=0 
>>>>> but it
>>>>> wasn't really clear to me
>>>>> what's the solution.
>>>>>
>>>>> In any case, seems to me that easier and more memory saving solution
>>>>> would be to just switch to per ttm bo dumy rw page that
>>>>> would be allocated on demand as you suggested here.  This should also
>>>>> take care of imported BOs and flink cases.
>>>>> Then i can drop the per device FD and per GEM object FD dummy BO and
>>>>> the ugly loop i am using in patch 2 to match faulting BO to the right
>>>>> dummy page.
>>>>>
>>>>> Does this makes sense ?
>>>> I still don't see the information leak as much of a problem, but if
>>>> Daniel insists we should probably do this.
>>> Well amdgpu doesn't clear buffers by default, so indeed you guys are a
>>> lot more laissez-faire here. But in general we really don't do that
>>> kind of leaking. Iirc there's even radeonsi bugs because else clears,
>>> and radeonsi happily displays gunk :-)
>>>
>>>> But could we at least have only one page per client instead of per BO?
>>> I think you can do one page per file descriptor or something like
>>> that. But gets annoying with shared bo, especially with dma_buf_mmap
>>> forwarding.
>>> -Daniel
>>
>>
>> Christian - is your concern more with too much page allocations or with extra 
>> pointer member
>> cluttering TTM BO struct ?
>
> Yes, that is one problem.
>
>> Because we can allocate the dummy page on demand only when
>> needed. It's just seems to me that keeping it per BO streamlines the code as 
>> I don't need to
>> have different handling for local vs imported BOs.
>
> Why should you have a difference between local vs imported BOs?


For local BO seems like Daniel's suggestion to use 
vm_area_struct->vm_file->private_data
should work as this points to drm_file. For imported BOs private_data will point 
to dma_buf structure
since each imported BO is backed by a pseudo file (created in dma_buf_getfile).
If so,where should we store the dummy RW BO in this case ? In current 
implementation  it's stored in drm_gem_object.

P.S For FLINK case it seems to me the handling should be no different then with 
local BO as the
FD used for mmap in this case is still the same one associated with the DRM file.

Andrey


>
> Christian.
>
>>
>> Andrey
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-16 19:00                     ` Andrey Grodzovsky
@ 2020-11-16 20:36                       ` Christian König
  2020-11-16 20:42                         ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Christian König @ 2020-11-16 20:36 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: Alex Deucher, Michel Dänzer, amd-gfx list, Pekka Paalanen,
	dri-devel

Am 16.11.20 um 20:00 schrieb Andrey Grodzovsky:
>
> On 11/16/20 4:48 AM, Christian König wrote:
>> Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
>>>
>>> On 11/14/20 4:51 AM, Daniel Vetter wrote:
>>>> On Sat, Nov 14, 2020 at 9:41 AM Christian König
>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>> Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
>>>>>> On 6/22/20 1:50 PM, Daniel Vetter wrote:
>>>>>>> On Mon, Jun 22, 2020 at 7:45 PM Christian König
>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>>>>>>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>>>>>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>> device is removed.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>>> ---
>>>>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c 
>>>>>>>>>>> b/drivers/gpu/drm/drm_file.c
>>>>>>>>>>> index c4c704e..67c0770 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>>>>>>>> drm_minor *minor)
>>>>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>>>>         }
>>>>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | 
>>>>>>>>>>> __GFP_ZERO);
>>>>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>>>>> +        ret = -ENOMEM;
>>>>>>>>>>> +        goto out_prime_destroy;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>>>         return file;
>>>>>>>>>>>       out_prime_destroy:
>>>>>>>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>>>>             dev->driver->postclose(dev, file);
>>>>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>>>>> +
>>>>>>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>>>>>> WARN_ON(!list_empty(&file->event_list));
>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>> b/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>> index 1de2cde..c482e9c 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>>>>>>>> drm_device *dev,
>>>>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>>>>                 dma_buf, *handle);
>>>>>>>>>>> +
>>>>>>>>>>> +    if (!ret) {
>>>>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>>>>> +            ret = -ENOMEM;
>>>>>>>>>>> +    }
>>>>>>>>>>> +
>>>>>>>>>> While the per file case still looks acceptable this is a 
>>>>>>>>>> clear NAK
>>>>>>>>>> since it will massively increase the memory needed for a prime
>>>>>>>>>> exported object.
>>>>>>>>>>
>>>>>>>>>> I think that this is quite overkill in the first place and 
>>>>>>>>>> for the
>>>>>>>>>> hot unplug case we can just use the global dummy page as well.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>> Global dummy page is good for read access, what do you do on 
>>>>>>>>> write
>>>>>>>>> access ? My first approach was indeed to map at first global 
>>>>>>>>> dummy
>>>>>>>>> page as read only and mark the vma->vm_flags as !VM_SHARED 
>>>>>>>>> assuming
>>>>>>>>> that this would trigger Copy On Write flow in core mm
>>>>>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kghiG3VpCJod6YefExoDVPl9X03zNhw3SN5GAxgbnmU%3D&amp;reserved=0) 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> on the next page fault to same address triggered by a write 
>>>>>>>>> access but
>>>>>>>>> then i realized a new COW page will be allocated for each such 
>>>>>>>>> mapping
>>>>>>>>> and this is much more wasteful then having a dedicated page 
>>>>>>>>> per GEM
>>>>>>>>> object.
>>>>>>>> Yeah, but this is only for a very very small corner cases. What 
>>>>>>>> we need
>>>>>>>> to prevent is increasing the memory usage during normal 
>>>>>>>> operation to
>>>>>>>> much.
>>>>>>>>
>>>>>>>> Using memory during the unplug is completely unproblematic 
>>>>>>>> because we
>>>>>>>> just released quite a bunch of it by releasing all those system 
>>>>>>>> memory
>>>>>>>> buffers.
>>>>>>>>
>>>>>>>> And I'm pretty sure that COWed pages are correctly accounted 
>>>>>>>> towards
>>>>>>>> the
>>>>>>>> used memory of a process.
>>>>>>>>
>>>>>>>> So I think if that approach works as intended and the COW pages 
>>>>>>>> are
>>>>>>>> released again on unmapping it would be the perfect solution to 
>>>>>>>> the
>>>>>>>> problem.
>>>>>>>>
>>>>>>>> Daniel what do you think?
>>>>>>> If COW works, sure sounds reasonable. And if we can make sure we
>>>>>>> managed to drop all the system allocations (otherwise suddenly 2x
>>>>>>> memory usage, worst case). But I have no idea whether we can
>>>>>>> retroshoehorn that into an established vma, you might have fun 
>>>>>>> stuff
>>>>>>> like a mkwrite handler there (which I thought is the COW handler
>>>>>>> thing, but really no idea).
>>>>>>>
>>>>>>> If we need to massively change stuff then I think rw dummy page,
>>>>>>> allocated on first fault after hotunplug (maybe just make it one 
>>>>>>> per
>>>>>>> object, that's simplest) seems like the much safer option. Much 
>>>>>>> less
>>>>>>> code that can go wrong.
>>>>>>> -Daniel
>>>>>>
>>>>>> Regarding COW, i was looking into how to properly implement it from
>>>>>> within the fault handler (i.e. ttm_bo_vm_fault)
>>>>>> and the main obstacle I hit is that of exclusive access to the
>>>>>> vm_area_struct, i need to be able to modify
>>>>>> vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can
>>>>>> be triggered on subsequent write access
>>>>>> fault (here
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fmm%2Fmemory.c%23L4128&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ziHJtqyHuLrlb0uYKhoWCWhUAZnX0JquE%2BkBJ5Fx%2BNo%3D&amp;reserved=0) 
>>>>>>
>>>>>> but core mm takes only read side mm_sem (here for example
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fiommu%2Famd%2Fiommu_v2.c%23L488&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h360c75Upl3%2FW7im7M1%2BxY%2FXy4gxin%2BkCF1Ui2zFXMs%3D&amp;reserved=0) 
>>>>>>
>>>>>> and so I am not supposed to modify vm_area_struct in this case. I am
>>>>>> not sure if it's legit to write lock tthe mm_sem from this point.
>>>>>> I found some discussions about this here
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu%2Fhypermail%2Flinux%2Fkernel%2F1909.1%2F02754.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224021379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sx6s1lH%2FvxbIZajc4Yr49vFhxvPEnBHZlTt52D8qvZA%3D&amp;reserved=0 
>>>>>> but it
>>>>>> wasn't really clear to me
>>>>>> what's the solution.
>>>>>>
>>>>>> In any case, seems to me that easier and more memory saving solution
>>>>>> would be to just switch to per ttm bo dumy rw page that
>>>>>> would be allocated on demand as you suggested here. This should also
>>>>>> take care of imported BOs and flink cases.
>>>>>> Then i can drop the per device FD and per GEM object FD dummy BO and
>>>>>> the ugly loop i am using in patch 2 to match faulting BO to the 
>>>>>> right
>>>>>> dummy page.
>>>>>>
>>>>>> Does this makes sense ?
>>>>> I still don't see the information leak as much of a problem, but if
>>>>> Daniel insists we should probably do this.
>>>> Well amdgpu doesn't clear buffers by default, so indeed you guys are a
>>>> lot more laissez-faire here. But in general we really don't do that
>>>> kind of leaking. Iirc there's even radeonsi bugs because else clears,
>>>> and radeonsi happily displays gunk :-)
>>>>
>>>>> But could we at least have only one page per client instead of per 
>>>>> BO?
>>>> I think you can do one page per file descriptor or something like
>>>> that. But gets annoying with shared bo, especially with dma_buf_mmap
>>>> forwarding.
>>>> -Daniel
>>>
>>>
>>> Christian - is your concern more with too much page allocations or 
>>> with extra pointer member
>>> cluttering TTM BO struct ?
>>
>> Yes, that is one problem.
>>
>>> Because we can allocate the dummy page on demand only when
>>> needed. It's just seems to me that keeping it per BO streamlines the 
>>> code as I don't need to
>>> have different handling for local vs imported BOs.
>>
>> Why should you have a difference between local vs imported BOs?
>
>
> For local BO seems like Daniel's suggestion to use 
> vm_area_struct->vm_file->private_data
> should work as this points to drm_file. For imported BOs private_data 
> will point to dma_buf structure
> since each imported BO is backed by a pseudo file (created in 
> dma_buf_getfile).

Oh, good point. But we could easily fix that now. That should make the 
mapping code less complex as well.

Regards,
Christian.

> If so,where should we store the dummy RW BO in this case ? In current 
> implementation  it's stored in drm_gem_object.
>
> P.S For FLINK case it seems to me the handling should be no different 
> then with local BO as the
> FD used for mmap in this case is still the same one associated with 
> the DRM file.
>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-16 20:36                       ` Christian König
@ 2020-11-16 20:42                         ` Andrey Grodzovsky
  2020-11-19 10:01                           ` Christian König
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-16 20:42 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: Alex Deucher, Michel Dänzer, amd-gfx list, Pekka Paalanen,
	dri-devel


On 11/16/20 3:36 PM, Christian König wrote:
> Am 16.11.20 um 20:00 schrieb Andrey Grodzovsky:
>>
>> On 11/16/20 4:48 AM, Christian König wrote:
>>> Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
>>>>
>>>> On 11/14/20 4:51 AM, Daniel Vetter wrote:
>>>>> On Sat, Nov 14, 2020 at 9:41 AM Christian König
>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>> Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
>>>>>>> On 6/22/20 1:50 PM, Daniel Vetter wrote:
>>>>>>>> On Mon, Jun 22, 2020 at 7:45 PM Christian König
>>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>>>>>>>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>>>>>>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>> device is removed.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>>>>>>>> index c4c704e..67c0770 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>>>>>>>>> drm_minor *minor)
>>>>>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>>>>>         }
>>>>>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>>>>>> +        ret = -ENOMEM;
>>>>>>>>>>>> +        goto out_prime_destroy;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>>         return file;
>>>>>>>>>>>>       out_prime_destroy:
>>>>>>>>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>>>>> dev->driver->postclose(dev, file);
>>>>>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>>>>>> +
>>>>>>>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>>>>>>> WARN_ON(!list_empty(&file->event_list));
>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>>> b/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>>> index 1de2cde..c482e9c 100644
>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>>>>>>>>> drm_device *dev,
>>>>>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>>>>>                 dma_buf, *handle);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (!ret) {
>>>>>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>>>>>> +            ret = -ENOMEM;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>> While the per file case still looks acceptable this is a clear NAK
>>>>>>>>>>> since it will massively increase the memory needed for a prime
>>>>>>>>>>> exported object.
>>>>>>>>>>>
>>>>>>>>>>> I think that this is quite overkill in the first place and for the
>>>>>>>>>>> hot unplug case we can just use the global dummy page as well.
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>> Global dummy page is good for read access, what do you do on write
>>>>>>>>>> access ? My first approach was indeed to map at first global dummy
>>>>>>>>>> page as read only and mark the vma->vm_flags as !VM_SHARED assuming
>>>>>>>>>> that this would trigger Copy On Write flow in core mm
>>>>>>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kghiG3VpCJod6YefExoDVPl9X03zNhw3SN5GAxgbnmU%3D&amp;reserved=0) 
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> on the next page fault to same address triggered by a write access but
>>>>>>>>>> then i realized a new COW page will be allocated for each such mapping
>>>>>>>>>> and this is much more wasteful then having a dedicated page per GEM
>>>>>>>>>> object.
>>>>>>>>> Yeah, but this is only for a very very small corner cases. What we need
>>>>>>>>> to prevent is increasing the memory usage during normal operation to
>>>>>>>>> much.
>>>>>>>>>
>>>>>>>>> Using memory during the unplug is completely unproblematic because we
>>>>>>>>> just released quite a bunch of it by releasing all those system memory
>>>>>>>>> buffers.
>>>>>>>>>
>>>>>>>>> And I'm pretty sure that COWed pages are correctly accounted towards
>>>>>>>>> the
>>>>>>>>> used memory of a process.
>>>>>>>>>
>>>>>>>>> So I think if that approach works as intended and the COW pages are
>>>>>>>>> released again on unmapping it would be the perfect solution to the
>>>>>>>>> problem.
>>>>>>>>>
>>>>>>>>> Daniel what do you think?
>>>>>>>> If COW works, sure sounds reasonable. And if we can make sure we
>>>>>>>> managed to drop all the system allocations (otherwise suddenly 2x
>>>>>>>> memory usage, worst case). But I have no idea whether we can
>>>>>>>> retroshoehorn that into an established vma, you might have fun stuff
>>>>>>>> like a mkwrite handler there (which I thought is the COW handler
>>>>>>>> thing, but really no idea).
>>>>>>>>
>>>>>>>> If we need to massively change stuff then I think rw dummy page,
>>>>>>>> allocated on first fault after hotunplug (maybe just make it one per
>>>>>>>> object, that's simplest) seems like the much safer option. Much less
>>>>>>>> code that can go wrong.
>>>>>>>> -Daniel
>>>>>>>
>>>>>>> Regarding COW, i was looking into how to properly implement it from
>>>>>>> within the fault handler (i.e. ttm_bo_vm_fault)
>>>>>>> and the main obstacle I hit is that of exclusive access to the
>>>>>>> vm_area_struct, i need to be able to modify
>>>>>>> vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so COW can
>>>>>>> be triggered on subsequent write access
>>>>>>> fault (here
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fmm%2Fmemory.c%23L4128&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ziHJtqyHuLrlb0uYKhoWCWhUAZnX0JquE%2BkBJ5Fx%2BNo%3D&amp;reserved=0) 
>>>>>>>
>>>>>>> but core mm takes only read side mm_sem (here for example
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fiommu%2Famd%2Fiommu_v2.c%23L488&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h360c75Upl3%2FW7im7M1%2BxY%2FXy4gxin%2BkCF1Ui2zFXMs%3D&amp;reserved=0) 
>>>>>>>
>>>>>>> and so I am not supposed to modify vm_area_struct in this case. I am
>>>>>>> not sure if it's legit to write lock tthe mm_sem from this point.
>>>>>>> I found some discussions about this here
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu%2Fhypermail%2Flinux%2Fkernel%2F1909.1%2F02754.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224021379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sx6s1lH%2FvxbIZajc4Yr49vFhxvPEnBHZlTt52D8qvZA%3D&amp;reserved=0 
>>>>>>> but it
>>>>>>> wasn't really clear to me
>>>>>>> what's the solution.
>>>>>>>
>>>>>>> In any case, seems to me that easier and more memory saving solution
>>>>>>> would be to just switch to per ttm bo dumy rw page that
>>>>>>> would be allocated on demand as you suggested here. This should also
>>>>>>> take care of imported BOs and flink cases.
>>>>>>> Then i can drop the per device FD and per GEM object FD dummy BO and
>>>>>>> the ugly loop i am using in patch 2 to match faulting BO to the right
>>>>>>> dummy page.
>>>>>>>
>>>>>>> Does this makes sense ?
>>>>>> I still don't see the information leak as much of a problem, but if
>>>>>> Daniel insists we should probably do this.
>>>>> Well amdgpu doesn't clear buffers by default, so indeed you guys are a
>>>>> lot more laissez-faire here. But in general we really don't do that
>>>>> kind of leaking. Iirc there's even radeonsi bugs because else clears,
>>>>> and radeonsi happily displays gunk :-)
>>>>>
>>>>>> But could we at least have only one page per client instead of per BO?
>>>>> I think you can do one page per file descriptor or something like
>>>>> that. But gets annoying with shared bo, especially with dma_buf_mmap
>>>>> forwarding.
>>>>> -Daniel
>>>>
>>>>
>>>> Christian - is your concern more with too much page allocations or with 
>>>> extra pointer member
>>>> cluttering TTM BO struct ?
>>>
>>> Yes, that is one problem.
>>>
>>>> Because we can allocate the dummy page on demand only when
>>>> needed. It's just seems to me that keeping it per BO streamlines the code 
>>>> as I don't need to
>>>> have different handling for local vs imported BOs.
>>>
>>> Why should you have a difference between local vs imported BOs?
>>
>>
>> For local BO seems like Daniel's suggestion to use 
>> vm_area_struct->vm_file->private_data
>> should work as this points to drm_file. For imported BOs private_data will 
>> point to dma_buf structure
>> since each imported BO is backed by a pseudo file (created in dma_buf_getfile).
>
> Oh, good point. But we could easily fix that now. That should make the mapping 
> code less complex as well.


Can you clarify what fix u have in mind ? I assume it's not by altering 
file->private_data to point
to something else as we need to retrieve dmabuf (e.g. dma_buf_mmap_internal)

Andrey


>
> Regards,
> Christian.
>
>> If so,where should we store the dummy RW BO in this case ? In current 
>> implementation  it's stored in drm_gem_object.
>>
>> P.S For FLINK case it seems to me the handling should be no different then 
>> with local BO as the
>> FD used for mmap in this case is still the same one associated with the DRM 
>> file.
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-06-22  9:53   ` Daniel Vetter
@ 2020-11-17 18:38     ` Andrey Grodzovsky
  2020-11-17 18:52       ` Daniel Vetter
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-17 18:38 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


On 6/22/20 5:53 AM, Daniel Vetter wrote:
> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
>> No point to try recovery if device is gone, just messes up things.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
>>   2 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 6932d75..5d6d3d9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>   	return ret;
>>   }
>>   
>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>> +		struct amdgpu_ring *ring = adev->rings[i];
>> +
>> +		if (!ring || !ring->sched.thread)
>> +			continue;
>> +
>> +		cancel_delayed_work_sync(&ring->sched.work_tdr);
>> +	}
>> +}
> I think this is a function that's supposed to be in drm/scheduler, not
> here. Might also just be your cleanup code being ordered wrongly, or your
> split in one of the earlier patches not done quite right.
> -Daniel


This function iterates across all the schedulers  per amdgpu device and accesses
amdgpu specific structures , drm/scheduler deals with single scheduler at most
so looks to me like this is the right place for this function

Andrey


>
>> +
>>   static void
>>   amdgpu_pci_remove(struct pci_dev *pdev)
>>   {
>>   	struct drm_device *dev = pci_get_drvdata(pdev);
>> +	struct amdgpu_device *adev = dev->dev_private;
>>   
>>   	drm_dev_unplug(dev);
>> +	amdgpu_cancel_all_tdr(adev);
>>   	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>>   	amdgpu_driver_unload_kms(dev);
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index 4720718..87ff0c0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -28,6 +28,8 @@
>>   #include "amdgpu.h"
>>   #include "amdgpu_trace.h"
>>   
>> +#include <drm/drm_drv.h>
>> +
>>   static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>   {
>>   	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>   
>>   	memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>   
>> +	if (drm_dev_is_unplugged(adev->ddev)) {
>> +		DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
>> +					  s_job->sched->name);
>> +		return;
>> +	}
>> +
>>   	if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
>>   		DRM_ERROR("ring %s timeout, but soft recovered\n",
>>   			  s_job->sched->name);
>> -- 
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-17 18:38     ` Andrey Grodzovsky
@ 2020-11-17 18:52       ` Daniel Vetter
  2020-11-17 19:18         ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-11-17 18:52 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
> 
> On 6/22/20 5:53 AM, Daniel Vetter wrote:
> > On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
> > > No point to try recovery if device is gone, just messes up things.
> > > 
> > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
> > >   2 files changed, 24 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index 6932d75..5d6d3d9 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
> > >   	return ret;
> > >   }
> > > +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
> > > +{
> > > +	int i;
> > > +
> > > +	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> > > +		struct amdgpu_ring *ring = adev->rings[i];
> > > +
> > > +		if (!ring || !ring->sched.thread)
> > > +			continue;
> > > +
> > > +		cancel_delayed_work_sync(&ring->sched.work_tdr);
> > > +	}
> > > +}
> > I think this is a function that's supposed to be in drm/scheduler, not
> > here. Might also just be your cleanup code being ordered wrongly, or your
> > split in one of the earlier patches not done quite right.
> > -Daniel
> 
> 
> This function iterates across all the schedulers  per amdgpu device and accesses
> amdgpu specific structures , drm/scheduler deals with single scheduler at most
> so looks to me like this is the right place for this function

I guess we could keep track of all schedulers somewhere in a list in
struct drm_device and wrap this up. That was kinda the idea.

Minimally I think a tiny wrapper with docs for the
cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
observe to make sure there's no race. I'm not exactly sure there's no
guarantee here we won't get a new tdr work launched right afterwards at
least, so this looks a bit like a hack.
-Daniel

> 
> Andrey
> 
> 
> > 
> > > +
> > >   static void
> > >   amdgpu_pci_remove(struct pci_dev *pdev)
> > >   {
> > >   	struct drm_device *dev = pci_get_drvdata(pdev);
> > > +	struct amdgpu_device *adev = dev->dev_private;
> > >   	drm_dev_unplug(dev);
> > > +	amdgpu_cancel_all_tdr(adev);
> > >   	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
> > >   	amdgpu_driver_unload_kms(dev);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > index 4720718..87ff0c0 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > @@ -28,6 +28,8 @@
> > >   #include "amdgpu.h"
> > >   #include "amdgpu_trace.h"
> > > +#include <drm/drm_drv.h>
> > > +
> > >   static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > >   {
> > >   	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > > @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > >   	memset(&ti, 0, sizeof(struct amdgpu_task_info));
> > > +	if (drm_dev_is_unplugged(adev->ddev)) {
> > > +		DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
> > > +					  s_job->sched->name);
> > > +		return;
> > > +	}
> > > +
> > >   	if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
> > >   		DRM_ERROR("ring %s timeout, but soft recovered\n",
> > >   			  s_job->sched->name);
> > > -- 
> > > 2.7.4
> > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-17 18:52       ` Daniel Vetter
@ 2020-11-17 19:18         ` Andrey Grodzovsky
  2020-11-17 19:49           ` Daniel Vetter
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-17 19:18 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


On 11/17/20 1:52 PM, Daniel Vetter wrote:
> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
>>>> No point to try recovery if device is gone, just messes up things.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
>>>>    2 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> index 6932d75..5d6d3d9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>>>    	return ret;
>>>>    }
>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>>>> +{
>>>> +	int i;
>>>> +
>>>> +	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>> +		struct amdgpu_ring *ring = adev->rings[i];
>>>> +
>>>> +		if (!ring || !ring->sched.thread)
>>>> +			continue;
>>>> +
>>>> +		cancel_delayed_work_sync(&ring->sched.work_tdr);
>>>> +	}
>>>> +}
>>> I think this is a function that's supposed to be in drm/scheduler, not
>>> here. Might also just be your cleanup code being ordered wrongly, or your
>>> split in one of the earlier patches not done quite right.
>>> -Daniel
>>
>> This function iterates across all the schedulers  per amdgpu device and accesses
>> amdgpu specific structures , drm/scheduler deals with single scheduler at most
>> so looks to me like this is the right place for this function
> I guess we could keep track of all schedulers somewhere in a list in
> struct drm_device and wrap this up. That was kinda the idea.
>
> Minimally I think a tiny wrapper with docs for the
> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
> observe to make sure there's no race.


Will do


> I'm not exactly sure there's no
> guarantee here we won't get a new tdr work launched right afterwards at
> least, so this looks a bit like a hack.


Note that for any TDR work happening post amdgpu_cancel_all_tdr 
amdgpu_job_timedout->drm_dev_is_unplugged
will return true and so it will return early. To make it water proof tight 
against race
i can switch from drm_dev_is_unplugged to drm_dev_enter/exit

Andrey


> -Daniel
>
>> Andrey
>>
>>
>>>> +
>>>>    static void
>>>>    amdgpu_pci_remove(struct pci_dev *pdev)
>>>>    {
>>>>    	struct drm_device *dev = pci_get_drvdata(pdev);
>>>> +	struct amdgpu_device *adev = dev->dev_private;
>>>>    	drm_dev_unplug(dev);
>>>> +	amdgpu_cancel_all_tdr(adev);
>>>>    	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>>>>    	amdgpu_driver_unload_kms(dev);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> index 4720718..87ff0c0 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> @@ -28,6 +28,8 @@
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_trace.h"
>>>> +#include <drm/drm_drv.h>
>>>> +
>>>>    static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>    {
>>>>    	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>>>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>    	memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>>> +	if (drm_dev_is_unplugged(adev->ddev)) {
>>>> +		DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
>>>> +					  s_job->sched->name);
>>>> +		return;
>>>> +	}
>>>> +
>>>>    	if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
>>>>    		DRM_ERROR("ring %s timeout, but soft recovered\n",
>>>>    			  s_job->sched->name);
>>>> -- 
>>>> 2.7.4
>>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-17 19:18         ` Andrey Grodzovsky
@ 2020-11-17 19:49           ` Daniel Vetter
  2020-11-17 20:07             ` Andrey Grodzovsky
  2020-11-18  0:46             ` Luben Tuikov
  0 siblings, 2 replies; 97+ messages in thread
From: Daniel Vetter @ 2020-11-17 19:49 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
> 
> On 11/17/20 1:52 PM, Daniel Vetter wrote:
> > On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
> > > On 6/22/20 5:53 AM, Daniel Vetter wrote:
> > > > On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
> > > > > No point to try recovery if device is gone, just messes up things.
> > > > > 
> > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > ---
> > > > >    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
> > > > >    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
> > > > >    2 files changed, 24 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > index 6932d75..5d6d3d9 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
> > > > >    	return ret;
> > > > >    }
> > > > > +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
> > > > > +{
> > > > > +	int i;
> > > > > +
> > > > > +	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> > > > > +		struct amdgpu_ring *ring = adev->rings[i];
> > > > > +
> > > > > +		if (!ring || !ring->sched.thread)
> > > > > +			continue;
> > > > > +
> > > > > +		cancel_delayed_work_sync(&ring->sched.work_tdr);
> > > > > +	}
> > > > > +}
> > > > I think this is a function that's supposed to be in drm/scheduler, not
> > > > here. Might also just be your cleanup code being ordered wrongly, or your
> > > > split in one of the earlier patches not done quite right.
> > > > -Daniel
> > > 
> > > This function iterates across all the schedulers  per amdgpu device and accesses
> > > amdgpu specific structures , drm/scheduler deals with single scheduler at most
> > > so looks to me like this is the right place for this function
> > I guess we could keep track of all schedulers somewhere in a list in
> > struct drm_device and wrap this up. That was kinda the idea.
> > 
> > Minimally I think a tiny wrapper with docs for the
> > cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
> > observe to make sure there's no race.
> 
> 
> Will do
> 
> 
> > I'm not exactly sure there's no
> > guarantee here we won't get a new tdr work launched right afterwards at
> > least, so this looks a bit like a hack.
> 
> 
> Note that for any TDR work happening post amdgpu_cancel_all_tdr
> amdgpu_job_timedout->drm_dev_is_unplugged
> will return true and so it will return early. To make it water proof tight
> against race
> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit

Hm that's confusing. You do a work_cancel_sync, so that at least looks
like "tdr work must not run after this point"

If you only rely on drm_dev_enter/exit check with the tdr work, then
there's no need to cancel anything.

For race free cancel_work_sync you need:
1. make sure whatever is calling schedule_work is guaranteed to no longer
call schedule_work.
2. call cancel_work_sync

Anything else is cargo-culted work cleanup:

- 1. without 2. means if a work got scheduled right before it'll still be
  a problem.
- 2. without 1. means a schedule_work right after makes you calling
  cancel_work_sync pointless.

So either both or nothing.
-Daniel

> 
> Andrey
> 
> 
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > > > +
> > > > >    static void
> > > > >    amdgpu_pci_remove(struct pci_dev *pdev)
> > > > >    {
> > > > >    	struct drm_device *dev = pci_get_drvdata(pdev);
> > > > > +	struct amdgpu_device *adev = dev->dev_private;
> > > > >    	drm_dev_unplug(dev);
> > > > > +	amdgpu_cancel_all_tdr(adev);
> > > > >    	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
> > > > >    	amdgpu_driver_unload_kms(dev);
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > index 4720718..87ff0c0 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > > @@ -28,6 +28,8 @@
> > > > >    #include "amdgpu.h"
> > > > >    #include "amdgpu_trace.h"
> > > > > +#include <drm/drm_drv.h>
> > > > > +
> > > > >    static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > > > >    {
> > > > >    	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > > > > @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > > > >    	memset(&ti, 0, sizeof(struct amdgpu_task_info));
> > > > > +	if (drm_dev_is_unplugged(adev->ddev)) {
> > > > > +		DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
> > > > > +					  s_job->sched->name);
> > > > > +		return;
> > > > > +	}
> > > > > +
> > > > >    	if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
> > > > >    		DRM_ERROR("ring %s timeout, but soft recovered\n",
> > > > >    			  s_job->sched->name);
> > > > > -- 
> > > > > 2.7.4
> > > > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-17 19:49           ` Daniel Vetter
@ 2020-11-17 20:07             ` Andrey Grodzovsky
  2020-11-18  7:39               ` Daniel Vetter
  2020-11-18  0:46             ` Luben Tuikov
  1 sibling, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-17 20:07 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	ckoenig.leichtzumerken, alexdeucher


On 11/17/20 2:49 PM, Daniel Vetter wrote:
> On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
>> On 11/17/20 1:52 PM, Daniel Vetter wrote:
>>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
>>>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
>>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
>>>>>> No point to try recovery if device is gone, just messes up things.
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
>>>>>>     2 files changed, 24 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> index 6932d75..5d6d3d9 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>>>>>     	return ret;
>>>>>>     }
>>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>>>>>> +{
>>>>>> +	int i;
>>>>>> +
>>>>>> +	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>> +		struct amdgpu_ring *ring = adev->rings[i];
>>>>>> +
>>>>>> +		if (!ring || !ring->sched.thread)
>>>>>> +			continue;
>>>>>> +
>>>>>> +		cancel_delayed_work_sync(&ring->sched.work_tdr);
>>>>>> +	}
>>>>>> +}
>>>>> I think this is a function that's supposed to be in drm/scheduler, not
>>>>> here. Might also just be your cleanup code being ordered wrongly, or your
>>>>> split in one of the earlier patches not done quite right.
>>>>> -Daniel
>>>> This function iterates across all the schedulers  per amdgpu device and accesses
>>>> amdgpu specific structures , drm/scheduler deals with single scheduler at most
>>>> so looks to me like this is the right place for this function
>>> I guess we could keep track of all schedulers somewhere in a list in
>>> struct drm_device and wrap this up. That was kinda the idea.
>>>
>>> Minimally I think a tiny wrapper with docs for the
>>> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
>>> observe to make sure there's no race.
>>
>> Will do
>>
>>
>>> I'm not exactly sure there's no
>>> guarantee here we won't get a new tdr work launched right afterwards at
>>> least, so this looks a bit like a hack.
>>
>> Note that for any TDR work happening post amdgpu_cancel_all_tdr
>> amdgpu_job_timedout->drm_dev_is_unplugged
>> will return true and so it will return early. To make it water proof tight
>> against race
>> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
> Hm that's confusing. You do a work_cancel_sync, so that at least looks
> like "tdr work must not run after this point"
>
> If you only rely on drm_dev_enter/exit check with the tdr work, then
> there's no need to cancel anything.


Agree, synchronize_srcu from drm_dev_unplug should play the role
of 'flushing' any earlier (in progress) tdr work which is
using drm_dev_enter/exit pair. Any later arising tdr will terminate early when 
drm_dev_enter
returns false.

Will update.

Andrey


>
> For race free cancel_work_sync you need:
> 1. make sure whatever is calling schedule_work is guaranteed to no longer
> call schedule_work.
> 2. call cancel_work_sync
>
> Anything else is cargo-culted work cleanup:
>
> - 1. without 2. means if a work got scheduled right before it'll still be
>    a problem.
> - 2. without 1. means a schedule_work right after makes you calling
>    cancel_work_sync pointless.
>
> So either both or nothing.
> -Daniel
>
>> Andrey
>>
>>
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>>> +
>>>>>>     static void
>>>>>>     amdgpu_pci_remove(struct pci_dev *pdev)
>>>>>>     {
>>>>>>     	struct drm_device *dev = pci_get_drvdata(pdev);
>>>>>> +	struct amdgpu_device *adev = dev->dev_private;
>>>>>>     	drm_dev_unplug(dev);
>>>>>> +	amdgpu_cancel_all_tdr(adev);
>>>>>>     	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>>>>>>     	amdgpu_driver_unload_kms(dev);
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> index 4720718..87ff0c0 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> @@ -28,6 +28,8 @@
>>>>>>     #include "amdgpu.h"
>>>>>>     #include "amdgpu_trace.h"
>>>>>> +#include <drm/drm_drv.h>
>>>>>> +
>>>>>>     static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>>>     {
>>>>>>     	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>>>>>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>>>     	memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>>>>> +	if (drm_dev_is_unplugged(adev->ddev)) {
>>>>>> +		DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
>>>>>> +					  s_job->sched->name);
>>>>>> +		return;
>>>>>> +	}
>>>>>> +
>>>>>>     	if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
>>>>>>     		DRM_ERROR("ring %s timeout, but soft recovered\n",
>>>>>>     			  s_job->sched->name);
>>>>>> -- 
>>>>>> 2.7.4
>>>>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-17 19:49           ` Daniel Vetter
  2020-11-17 20:07             ` Andrey Grodzovsky
@ 2020-11-18  0:46             ` Luben Tuikov
  1 sibling, 0 replies; 97+ messages in thread
From: Luben Tuikov @ 2020-11-18  0:46 UTC (permalink / raw)
  To: Daniel Vetter, Andrey Grodzovsky
  Cc: daniel.vetter, michel, amd-gfx, dri-devel, ckoenig.leichtzumerken

On 2020-11-17 2:49 p.m., Daniel Vetter wrote:
> On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
>>
>> On 11/17/20 1:52 PM, Daniel Vetter wrote:
>>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
>>>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
>>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
>>>>>> No point to try recovery if device is gone, just messes up things.
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
>>>>>>    2 files changed, 24 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> index 6932d75..5d6d3d9 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>>>>>    	return ret;
>>>>>>    }
>>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>>>>>> +{
>>>>>> +	int i;
>>>>>> +
>>>>>> +	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>> +		struct amdgpu_ring *ring = adev->rings[i];
>>>>>> +
>>>>>> +		if (!ring || !ring->sched.thread)
>>>>>> +			continue;
>>>>>> +
>>>>>> +		cancel_delayed_work_sync(&ring->sched.work_tdr);
>>>>>> +	}
>>>>>> +}
>>>>> I think this is a function that's supposed to be in drm/scheduler, not
>>>>> here. Might also just be your cleanup code being ordered wrongly, or your
>>>>> split in one of the earlier patches not done quite right.
>>>>> -Daniel
>>>>
>>>> This function iterates across all the schedulers  per amdgpu device and accesses
>>>> amdgpu specific structures , drm/scheduler deals with single scheduler at most
>>>> so looks to me like this is the right place for this function
>>> I guess we could keep track of all schedulers somewhere in a list in
>>> struct drm_device and wrap this up. That was kinda the idea.
>>>
>>> Minimally I think a tiny wrapper with docs for the
>>> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
>>> observe to make sure there's no race.
>>
>>
>> Will do
>>
>>
>>> I'm not exactly sure there's no
>>> guarantee here we won't get a new tdr work launched right afterwards at
>>> least, so this looks a bit like a hack.
>>
>>
>> Note that for any TDR work happening post amdgpu_cancel_all_tdr
>> amdgpu_job_timedout->drm_dev_is_unplugged
>> will return true and so it will return early. To make it water proof tight
>> against race
>> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
> 
> Hm that's confusing. You do a work_cancel_sync, so that at least looks
> like "tdr work must not run after this point"
> 
> If you only rely on drm_dev_enter/exit check with the tdr work, then
> there's no need to cancel anything.
> 
> For race free cancel_work_sync you need:
> 1. make sure whatever is calling schedule_work is guaranteed to no longer
> call schedule_work.
> 2. call cancel_work_sync
> 
> Anything else is cargo-culted work cleanup:
> 
> - 1. without 2. means if a work got scheduled right before it'll still be
>   a problem.
> - 2. without 1. means a schedule_work right after makes you calling
>   cancel_work_sync pointless.

This is sound advice and I did something similar for SAS over a decade
ago where an expander could be disconnected from the domain via which
many IOs are flying to end devices.

You need a small tiny DRM function which low-level drivers (such as amdgpu)
call in order to tell DRM that this device is not accepting commands
any more (sets a flag) and starts a thread to clean up commands
which are "done" or "incoming". At the same time, the low-level driver
returns commands which are pending in the hardware back out to
DRM (thus those commands become "done" from "pending"), and
DRM cleans them up.(*)

The point is that you're not bubbling up the error, but
directly notifying the highest level of upper layer to hold off,
while you're cleaning up all incoming and pending commands.

Depending on the situation, case 1 above has two sub-cases:

a) the device will not come back--then cancel any new work
   back out to the application client, or
b) the device may come back again, i.e. it is being reset,
   then you can queue up work, assuming the device will
   come back on successfully and you'd be able to send
   the incoming requests down to it. Or cancel everything
   and let the application client do the queueing and
   resubmission, like in a). The latter will not work when this
   resubmission (and error recovery) is done without
   the knowledge of the application client, for instance
   communication or parity errors, protocol retries, etc.

(*) I've some work coming in, in the scheduler, which could make
this handling easier, or at least set a mechanism by which
this could be made easier.

Regards,
Luben

> 
> So either both or nothing.
> -Daniel
> 
>>
>> Andrey
>>
>>
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>>> +
>>>>>>    static void
>>>>>>    amdgpu_pci_remove(struct pci_dev *pdev)
>>>>>>    {
>>>>>>    	struct drm_device *dev = pci_get_drvdata(pdev);
>>>>>> +	struct amdgpu_device *adev = dev->dev_private;
>>>>>>    	drm_dev_unplug(dev);
>>>>>> +	amdgpu_cancel_all_tdr(adev);
>>>>>>    	ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>>>>>>    	amdgpu_driver_unload_kms(dev);
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> index 4720718..87ff0c0 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> @@ -28,6 +28,8 @@
>>>>>>    #include "amdgpu.h"
>>>>>>    #include "amdgpu_trace.h"
>>>>>> +#include <drm/drm_drv.h>
>>>>>> +
>>>>>>    static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>>>    {
>>>>>>    	struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>>>>>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>>>    	memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>>>>> +	if (drm_dev_is_unplugged(adev->ddev)) {
>>>>>> +		DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
>>>>>> +					  s_job->sched->name);
>>>>>> +		return;
>>>>>> +	}
>>>>>> +
>>>>>>    	if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
>>>>>>    		DRM_ERROR("ring %s timeout, but soft recovered\n",
>>>>>>    			  s_job->sched->name);
>>>>>> -- 
>>>>>> 2.7.4
>>>>>>
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-17 20:07             ` Andrey Grodzovsky
@ 2020-11-18  7:39               ` Daniel Vetter
  2020-11-18 12:01                 ` Christian König
  0 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-11-18  7:39 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Christian König, Michel Dänzer, dri-devel,
	Pekka Paalanen, amd-gfx list, Alex Deucher

On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
> On 11/17/20 2:49 PM, Daniel Vetter wrote:
> > On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
> >> On 11/17/20 1:52 PM, Daniel Vetter wrote:
> >>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
> >>>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
> >>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
> >>>>>> No point to try recovery if device is gone, just messes up things.
> >>>>>>
> >>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
> >>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
> >>>>>>     2 files changed, 24 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>>> index 6932d75..5d6d3d9 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> >>>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
> >>>>>>          return ret;
> >>>>>>     }
> >>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
> >>>>>> +{
> >>>>>> +        int i;
> >>>>>> +
> >>>>>> +        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> >>>>>> +                struct amdgpu_ring *ring = adev->rings[i];
> >>>>>> +
> >>>>>> +                if (!ring || !ring->sched.thread)
> >>>>>> +                        continue;
> >>>>>> +
> >>>>>> +                cancel_delayed_work_sync(&ring->sched.work_tdr);
> >>>>>> +        }
> >>>>>> +}
> >>>>> I think this is a function that's supposed to be in drm/scheduler, not
> >>>>> here. Might also just be your cleanup code being ordered wrongly, or your
> >>>>> split in one of the earlier patches not done quite right.
> >>>>> -Daniel
> >>>> This function iterates across all the schedulers  per amdgpu device and accesses
> >>>> amdgpu specific structures , drm/scheduler deals with single scheduler at most
> >>>> so looks to me like this is the right place for this function
> >>> I guess we could keep track of all schedulers somewhere in a list in
> >>> struct drm_device and wrap this up. That was kinda the idea.
> >>>
> >>> Minimally I think a tiny wrapper with docs for the
> >>> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
> >>> observe to make sure there's no race.
> >>
> >> Will do
> >>
> >>
> >>> I'm not exactly sure there's no
> >>> guarantee here we won't get a new tdr work launched right afterwards at
> >>> least, so this looks a bit like a hack.
> >>
> >> Note that for any TDR work happening post amdgpu_cancel_all_tdr
> >> amdgpu_job_timedout->drm_dev_is_unplugged
> >> will return true and so it will return early. To make it water proof tight
> >> against race
> >> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
> > Hm that's confusing. You do a work_cancel_sync, so that at least looks
> > like "tdr work must not run after this point"
> >
> > If you only rely on drm_dev_enter/exit check with the tdr work, then
> > there's no need to cancel anything.
>
>
> Agree, synchronize_srcu from drm_dev_unplug should play the role
> of 'flushing' any earlier (in progress) tdr work which is
> using drm_dev_enter/exit pair. Any later arising tdr will terminate early when
> drm_dev_enter
> returns false.

Nope, anything you put into the work itself cannot close this race.
It's the schedule_work that matters here. Or I'm missing something ...
I thought that the tdr work you're cancelling here is launched by
drm/scheduler code, not by the amd callback?
-Daniel

>
> Will update.
>
> Andrey
>
>
> >
> > For race free cancel_work_sync you need:
> > 1. make sure whatever is calling schedule_work is guaranteed to no longer
> > call schedule_work.
> > 2. call cancel_work_sync
> >
> > Anything else is cargo-culted work cleanup:
> >
> > - 1. without 2. means if a work got scheduled right before it'll still be
> >    a problem.
> > - 2. without 1. means a schedule_work right after makes you calling
> >    cancel_work_sync pointless.
> >
> > So either both or nothing.
> > -Daniel
> >
> >> Andrey
> >>
> >>
> >>> -Daniel
> >>>
> >>>> Andrey
> >>>>
> >>>>
> >>>>>> +
> >>>>>>     static void
> >>>>>>     amdgpu_pci_remove(struct pci_dev *pdev)
> >>>>>>     {
> >>>>>>          struct drm_device *dev = pci_get_drvdata(pdev);
> >>>>>> +        struct amdgpu_device *adev = dev->dev_private;
> >>>>>>          drm_dev_unplug(dev);
> >>>>>> +        amdgpu_cancel_all_tdr(adev);
> >>>>>>          ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
> >>>>>>          amdgpu_driver_unload_kms(dev);
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>> index 4720718..87ff0c0 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> >>>>>> @@ -28,6 +28,8 @@
> >>>>>>     #include "amdgpu.h"
> >>>>>>     #include "amdgpu_trace.h"
> >>>>>> +#include <drm/drm_drv.h>
> >>>>>> +
> >>>>>>     static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> >>>>>>     {
> >>>>>>          struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> >>>>>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> >>>>>>          memset(&ti, 0, sizeof(struct amdgpu_task_info));
> >>>>>> +        if (drm_dev_is_unplugged(adev->ddev)) {
> >>>>>> +                DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
> >>>>>> +                                          s_job->sched->name);
> >>>>>> +                return;
> >>>>>> +        }
> >>>>>> +
> >>>>>>          if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
> >>>>>>                  DRM_ERROR("ring %s timeout, but soft recovered\n",
> >>>>>>                            s_job->sched->name);
> >>>>>> --
> >>>>>> 2.7.4
> >>>>>>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-18  7:39               ` Daniel Vetter
@ 2020-11-18 12:01                 ` Christian König
  2020-11-18 15:43                   ` Luben Tuikov
  2020-11-18 16:20                   ` Andrey Grodzovsky
  0 siblings, 2 replies; 97+ messages in thread
From: Christian König @ 2020-11-18 12:01 UTC (permalink / raw)
  To: Daniel Vetter, Andrey Grodzovsky
  Cc: Alex Deucher, Michel Dänzer, Pekka Paalanen, dri-devel,
	amd-gfx list

Am 18.11.20 um 08:39 schrieb Daniel Vetter:
> On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> On 11/17/20 2:49 PM, Daniel Vetter wrote:
>>> On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
>>>> On 11/17/20 1:52 PM, Daniel Vetter wrote:
>>>>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
>>>>>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
>>>>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
>>>>>>>> No point to try recovery if device is gone, just messes up things.
>>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
>>>>>>>>      2 files changed, 24 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>> index 6932d75..5d6d3d9 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>>>>>>>           return ret;
>>>>>>>>      }
>>>>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>>>>>>>> +{
>>>>>>>> +        int i;
>>>>>>>> +
>>>>>>>> +        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>>>> +                struct amdgpu_ring *ring = adev->rings[i];
>>>>>>>> +
>>>>>>>> +                if (!ring || !ring->sched.thread)
>>>>>>>> +                        continue;
>>>>>>>> +
>>>>>>>> +                cancel_delayed_work_sync(&ring->sched.work_tdr);
>>>>>>>> +        }
>>>>>>>> +}
>>>>>>> I think this is a function that's supposed to be in drm/scheduler, not
>>>>>>> here. Might also just be your cleanup code being ordered wrongly, or your
>>>>>>> split in one of the earlier patches not done quite right.
>>>>>>> -Daniel
>>>>>> This function iterates across all the schedulers  per amdgpu device and accesses
>>>>>> amdgpu specific structures , drm/scheduler deals with single scheduler at most
>>>>>> so looks to me like this is the right place for this function
>>>>> I guess we could keep track of all schedulers somewhere in a list in
>>>>> struct drm_device and wrap this up. That was kinda the idea.
>>>>>
>>>>> Minimally I think a tiny wrapper with docs for the
>>>>> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
>>>>> observe to make sure there's no race.
>>>> Will do
>>>>
>>>>
>>>>> I'm not exactly sure there's no
>>>>> guarantee here we won't get a new tdr work launched right afterwards at
>>>>> least, so this looks a bit like a hack.
>>>> Note that for any TDR work happening post amdgpu_cancel_all_tdr
>>>> amdgpu_job_timedout->drm_dev_is_unplugged
>>>> will return true and so it will return early. To make it water proof tight
>>>> against race
>>>> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
>>> Hm that's confusing. You do a work_cancel_sync, so that at least looks
>>> like "tdr work must not run after this point"
>>>
>>> If you only rely on drm_dev_enter/exit check with the tdr work, then
>>> there's no need to cancel anything.
>>
>> Agree, synchronize_srcu from drm_dev_unplug should play the role
>> of 'flushing' any earlier (in progress) tdr work which is
>> using drm_dev_enter/exit pair. Any later arising tdr will terminate early when
>> drm_dev_enter
>> returns false.
> Nope, anything you put into the work itself cannot close this race.
> It's the schedule_work that matters here. Or I'm missing something ...
> I thought that the tdr work you're cancelling here is launched by
> drm/scheduler code, not by the amd callback?

Yes that is correct. Canceling the work item is not the right approach 
at all, nor is adding dev_enter/exit pair in the recovery handler.

What we need to do here is to stop the scheduler thread and then wait 
for any timeout handling to have finished.

Otherwise it can scheduler a new timeout just after we have canceled 
this one.

Regards,
Christian.

> -Daniel
>
>> Will update.
>>
>> Andrey
>>
>>
>>> For race free cancel_work_sync you need:
>>> 1. make sure whatever is calling schedule_work is guaranteed to no longer
>>> call schedule_work.
>>> 2. call cancel_work_sync
>>>
>>> Anything else is cargo-culted work cleanup:
>>>
>>> - 1. without 2. means if a work got scheduled right before it'll still be
>>>     a problem.
>>> - 2. without 1. means a schedule_work right after makes you calling
>>>     cancel_work_sync pointless.
>>>
>>> So either both or nothing.
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>> -Daniel
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>> +
>>>>>>>>      static void
>>>>>>>>      amdgpu_pci_remove(struct pci_dev *pdev)
>>>>>>>>      {
>>>>>>>>           struct drm_device *dev = pci_get_drvdata(pdev);
>>>>>>>> +        struct amdgpu_device *adev = dev->dev_private;
>>>>>>>>           drm_dev_unplug(dev);
>>>>>>>> +        amdgpu_cancel_all_tdr(adev);
>>>>>>>>           ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>>>>>>>>           amdgpu_driver_unload_kms(dev);
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>> index 4720718..87ff0c0 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>> @@ -28,6 +28,8 @@
>>>>>>>>      #include "amdgpu.h"
>>>>>>>>      #include "amdgpu_trace.h"
>>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>> +
>>>>>>>>      static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>>>>>      {
>>>>>>>>           struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>>>>>>>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>>>>>           memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>>>>>>> +        if (drm_dev_is_unplugged(adev->ddev)) {
>>>>>>>> +                DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
>>>>>>>> +                                          s_job->sched->name);
>>>>>>>> +                return;
>>>>>>>> +        }
>>>>>>>> +
>>>>>>>>           if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
>>>>>>>>                   DRM_ERROR("ring %s timeout, but soft recovered\n",
>>>>>>>>                             s_job->sched->name);
>>>>>>>> --
>>>>>>>> 2.7.4
>>>>>>>>
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-18 12:01                 ` Christian König
@ 2020-11-18 15:43                   ` Luben Tuikov
  2020-11-18 16:20                   ` Andrey Grodzovsky
  1 sibling, 0 replies; 97+ messages in thread
From: Luben Tuikov @ 2020-11-18 15:43 UTC (permalink / raw)
  To: christian.koenig, Daniel Vetter, Andrey Grodzovsky
  Cc: Michel Dänzer, amd-gfx list, dri-devel

On 2020-11-18 07:01, Christian König wrote:
> Am 18.11.20 um 08:39 schrieb Daniel Vetter:
>> On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky
>> <Andrey.Grodzovsky@amd.com> wrote:
>>>
>>> On 11/17/20 2:49 PM, Daniel Vetter wrote:
>>>> On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
>>>>> On 11/17/20 1:52 PM, Daniel Vetter wrote:
>>>>>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
>>>>>>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
>>>>>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
>>>>>>>>> No point to try recovery if device is gone, just messes up things.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
>>>>>>>>>      2 files changed, 24 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>> index 6932d75..5d6d3d9 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>>>>>>>>           return ret;
>>>>>>>>>      }
>>>>>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>>>>>>>>> +{
>>>>>>>>> +        int i;
>>>>>>>>> +
>>>>>>>>> +        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>>>>> +                struct amdgpu_ring *ring = adev->rings[i];
>>>>>>>>> +
>>>>>>>>> +                if (!ring || !ring->sched.thread)
>>>>>>>>> +                        continue;
>>>>>>>>> +
>>>>>>>>> +                cancel_delayed_work_sync(&ring->sched.work_tdr);
>>>>>>>>> +        }
>>>>>>>>> +}
>>>>>>>> I think this is a function that's supposed to be in drm/scheduler, not
>>>>>>>> here. Might also just be your cleanup code being ordered wrongly, or your
>>>>>>>> split in one of the earlier patches not done quite right.
>>>>>>>> -Daniel
>>>>>>> This function iterates across all the schedulers  per amdgpu device and accesses
>>>>>>> amdgpu specific structures , drm/scheduler deals with single scheduler at most
>>>>>>> so looks to me like this is the right place for this function
>>>>>> I guess we could keep track of all schedulers somewhere in a list in
>>>>>> struct drm_device and wrap this up. That was kinda the idea.
>>>>>>
>>>>>> Minimally I think a tiny wrapper with docs for the
>>>>>> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
>>>>>> observe to make sure there's no race.
>>>>> Will do
>>>>>
>>>>>
>>>>>> I'm not exactly sure there's no
>>>>>> guarantee here we won't get a new tdr work launched right afterwards at
>>>>>> least, so this looks a bit like a hack.
>>>>> Note that for any TDR work happening post amdgpu_cancel_all_tdr
>>>>> amdgpu_job_timedout->drm_dev_is_unplugged
>>>>> will return true and so it will return early. To make it water proof tight
>>>>> against race
>>>>> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
>>>> Hm that's confusing. You do a work_cancel_sync, so that at least looks
>>>> like "tdr work must not run after this point"
>>>>
>>>> If you only rely on drm_dev_enter/exit check with the tdr work, then
>>>> there's no need to cancel anything.
>>>
>>> Agree, synchronize_srcu from drm_dev_unplug should play the role
>>> of 'flushing' any earlier (in progress) tdr work which is
>>> using drm_dev_enter/exit pair. Any later arising tdr will terminate early when
>>> drm_dev_enter
>>> returns false.
>> Nope, anything you put into the work itself cannot close this race.
>> It's the schedule_work that matters here. Or I'm missing something ...
>> I thought that the tdr work you're cancelling here is launched by
>> drm/scheduler code, not by the amd callback?
> 
> Yes that is correct. Canceling the work item is not the right approach 
> at all, nor is adding dev_enter/exit pair in the recovery handler.
> 
> What we need to do here is to stop the scheduler thread and then wait 
> for any timeout handling to have finished.
> 
> Otherwise it can scheduler a new timeout just after we have canceled 
> this one.

Yep, that's exactly what I said in my email above.

Regards,
Luben

> 
> Regards,
> Christian.
> 
>> -Daniel
>>
>>> Will update.
>>>
>>> Andrey
>>>
>>>
>>>> For race free cancel_work_sync you need:
>>>> 1. make sure whatever is calling schedule_work is guaranteed to no longer
>>>> call schedule_work.
>>>> 2. call cancel_work_sync
>>>>
>>>> Anything else is cargo-culted work cleanup:
>>>>
>>>> - 1. without 2. means if a work got scheduled right before it'll still be
>>>>     a problem.
>>>> - 2. without 1. means a schedule_work right after makes you calling
>>>>     cancel_work_sync pointless.
>>>>
>>>> So either both or nothing.
>>>> -Daniel
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>> +
>>>>>>>>>      static void
>>>>>>>>>      amdgpu_pci_remove(struct pci_dev *pdev)
>>>>>>>>>      {
>>>>>>>>>           struct drm_device *dev = pci_get_drvdata(pdev);
>>>>>>>>> +        struct amdgpu_device *adev = dev->dev_private;
>>>>>>>>>           drm_dev_unplug(dev);
>>>>>>>>> +        amdgpu_cancel_all_tdr(adev);
>>>>>>>>>           ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>>>>>>>>>           amdgpu_driver_unload_kms(dev);
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>> index 4720718..87ff0c0 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>> @@ -28,6 +28,8 @@
>>>>>>>>>      #include "amdgpu.h"
>>>>>>>>>      #include "amdgpu_trace.h"
>>>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>>> +
>>>>>>>>>      static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>>>>>>      {
>>>>>>>>>           struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>>>>>>>>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>>>>>>           memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>>>>>>>> +        if (drm_dev_is_unplugged(adev->ddev)) {
>>>>>>>>> +                DRM_INFO("ring %s timeout, but device unplugged, skipping.\n",
>>>>>>>>> +                                          s_job->sched->name);
>>>>>>>>> +                return;
>>>>>>>>> +        }
>>>>>>>>> +
>>>>>>>>>           if (amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) {
>>>>>>>>>                   DRM_ERROR("ring %s timeout, but soft recovered\n",
>>>>>>>>>                             s_job->sched->name);
>>>>>>>>> --
>>>>>>>>> 2.7.4
>>>>>>>>>
>>
>>
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C4a5f8a2988214a9313ca08d88bb9aac4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637412976842283458%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=MRw6OX1TJtA4Xpk5yg53adav0%2FYoYDUN0VLjyjJ5R%2BY%3D&amp;reserved=0
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-18 12:01                 ` Christian König
  2020-11-18 15:43                   ` Luben Tuikov
@ 2020-11-18 16:20                   ` Andrey Grodzovsky
  2020-11-19  7:55                     ` Christian König
  1 sibling, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-18 16:20 UTC (permalink / raw)
  To: christian.koenig, Daniel Vetter
  Cc: Alex Deucher, Michel Dänzer, Pekka Paalanen, dri-devel,
	amd-gfx list


On 11/18/20 7:01 AM, Christian König wrote:
> Am 18.11.20 um 08:39 schrieb Daniel Vetter:
>> On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky
>> <Andrey.Grodzovsky@amd.com> wrote:
>>>
>>> On 11/17/20 2:49 PM, Daniel Vetter wrote:
>>>> On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
>>>>> On 11/17/20 1:52 PM, Daniel Vetter wrote:
>>>>>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
>>>>>>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
>>>>>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
>>>>>>>>> No point to try recovery if device is gone, just messes up things.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>> ---
>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 ++++++++
>>>>>>>>>      2 files changed, 24 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>> index 6932d75..5d6d3d9 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>>>>>>>>           return ret;
>>>>>>>>>      }
>>>>>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>>>>>>>>> +{
>>>>>>>>> +        int i;
>>>>>>>>> +
>>>>>>>>> +        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>>>>> +                struct amdgpu_ring *ring = adev->rings[i];
>>>>>>>>> +
>>>>>>>>> +                if (!ring || !ring->sched.thread)
>>>>>>>>> +                        continue;
>>>>>>>>> +
>>>>>>>>> + cancel_delayed_work_sync(&ring->sched.work_tdr);
>>>>>>>>> +        }
>>>>>>>>> +}
>>>>>>>> I think this is a function that's supposed to be in drm/scheduler, not
>>>>>>>> here. Might also just be your cleanup code being ordered wrongly, or your
>>>>>>>> split in one of the earlier patches not done quite right.
>>>>>>>> -Daniel
>>>>>>> This function iterates across all the schedulers  per amdgpu device and 
>>>>>>> accesses
>>>>>>> amdgpu specific structures , drm/scheduler deals with single scheduler 
>>>>>>> at most
>>>>>>> so looks to me like this is the right place for this function
>>>>>> I guess we could keep track of all schedulers somewhere in a list in
>>>>>> struct drm_device and wrap this up. That was kinda the idea.
>>>>>>
>>>>>> Minimally I think a tiny wrapper with docs for the
>>>>>> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
>>>>>> observe to make sure there's no race.
>>>>> Will do
>>>>>
>>>>>
>>>>>> I'm not exactly sure there's no
>>>>>> guarantee here we won't get a new tdr work launched right afterwards at
>>>>>> least, so this looks a bit like a hack.
>>>>> Note that for any TDR work happening post amdgpu_cancel_all_tdr
>>>>> amdgpu_job_timedout->drm_dev_is_unplugged
>>>>> will return true and so it will return early. To make it water proof tight
>>>>> against race
>>>>> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
>>>> Hm that's confusing. You do a work_cancel_sync, so that at least looks
>>>> like "tdr work must not run after this point"
>>>>
>>>> If you only rely on drm_dev_enter/exit check with the tdr work, then
>>>> there's no need to cancel anything.
>>>
>>> Agree, synchronize_srcu from drm_dev_unplug should play the role
>>> of 'flushing' any earlier (in progress) tdr work which is
>>> using drm_dev_enter/exit pair. Any later arising tdr will terminate early when
>>> drm_dev_enter
>>> returns false.
>> Nope, anything you put into the work itself cannot close this race.
>> It's the schedule_work that matters here. Or I'm missing something ...
>> I thought that the tdr work you're cancelling here is launched by
>> drm/scheduler code, not by the amd callback?


My bad, you are right, I am supposed to put drm_dev_enter.exit pair into 
drm_sched_job_timedout


>
> Yes that is correct. Canceling the work item is not the right approach at all, 
> nor is adding dev_enter/exit pair in the recovery handler.


Without adding the dev_enter/exit guarding pair in the recovery handler you are 
ending up with GPU reset starting while
the device is already unplugged, this leads to multiple errors and general mess.


>
> What we need to do here is to stop the scheduler thread and then wait for any 
> timeout handling to have finished.
>
> Otherwise it can scheduler a new timeout just after we have canceled this one.
>
> Regards,
> Christian.


Schedulers are stopped from amdgpu_driver_unload_kms which indeed happens after 
drm_dev_unplug
so yes, there is still a chance for new work being scheduler and timeout armed 
after but, once i fix the code
to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't see why that 
not a good solution ?
Any tdr work started after drm_dev_unplug finished will simply abort on entry to 
drm_sched_job_timedout
because drm_dev_enter will be false and the function will return without 
rearming the timeout timer and
so will have no impact.

The only issue i see here now is of possible use after free if some late tdr 
work will try to execute after
drm device already gone, for this we probably should add 
cancel_delayed_work_sync(sched.work_tdr)
to drm_sched_fini after sched->thread is stopped there.

Andrey


>
>> -Daniel
>>
>>> Will update.
>>>
>>> Andrey
>>>
>>>
>>>> For race free cancel_work_sync you need:
>>>> 1. make sure whatever is calling schedule_work is guaranteed to no longer
>>>> call schedule_work.
>>>> 2. call cancel_work_sync
>>>>
>>>> Anything else is cargo-culted work cleanup:
>>>>
>>>> - 1. without 2. means if a work got scheduled right before it'll still be
>>>>     a problem.
>>>> - 2. without 1. means a schedule_work right after makes you calling
>>>>     cancel_work_sync pointless.
>>>>
>>>> So either both or nothing.
>>>> -Daniel
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>>> +
>>>>>>>>>      static void
>>>>>>>>>      amdgpu_pci_remove(struct pci_dev *pdev)
>>>>>>>>>      {
>>>>>>>>>           struct drm_device *dev = pci_get_drvdata(pdev);
>>>>>>>>> +        struct amdgpu_device *adev = dev->dev_private;
>>>>>>>>>           drm_dev_unplug(dev);
>>>>>>>>> +        amdgpu_cancel_all_tdr(adev);
>>>>>>>>> ttm_bo_unmap_virtual_address_space(&adev->mman.bdev);
>>>>>>>>>           amdgpu_driver_unload_kms(dev);
>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>> index 4720718..87ff0c0 100644
>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>>>>> @@ -28,6 +28,8 @@
>>>>>>>>>      #include "amdgpu.h"
>>>>>>>>>      #include "amdgpu_trace.h"
>>>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>>> +
>>>>>>>>>      static void amdgpu_job_timedout(struct drm_sched_job *s_job)
>>>>>>>>>      {
>>>>>>>>>           struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
>>>>>>>>> @@ -37,6 +39,12 @@ static void amdgpu_job_timedout(struct 
>>>>>>>>> drm_sched_job *s_job)
>>>>>>>>>           memset(&ti, 0, sizeof(struct amdgpu_task_info));
>>>>>>>>> +        if (drm_dev_is_unplugged(adev->ddev)) {
>>>>>>>>> +                DRM_INFO("ring %s timeout, but device unplugged, 
>>>>>>>>> skipping.\n",
>>>>>>>>> + s_job->sched->name);
>>>>>>>>> +                return;
>>>>>>>>> +        }
>>>>>>>>> +
>>>>>>>>>           if (amdgpu_ring_soft_recovery(ring, job->vmid, 
>>>>>>>>> s_job->s_fence->parent)) {
>>>>>>>>>                   DRM_ERROR("ring %s timeout, but soft recovered\n",
>>>>>>>>> s_job->sched->name);
>>>>>>>>> -- 
>>>>>>>>> 2.7.4
>>>>>>>>>
>>
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-18 16:20                   ` Andrey Grodzovsky
@ 2020-11-19  7:55                     ` Christian König
  2020-11-19 15:02                       ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Christian König @ 2020-11-19  7:55 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, Daniel Vetter
  Cc: Alex Deucher, Michel Dänzer, amd-gfx list, Pekka Paalanen,
	dri-devel

Am 18.11.20 um 17:20 schrieb Andrey Grodzovsky:
>
> On 11/18/20 7:01 AM, Christian König wrote:
>> Am 18.11.20 um 08:39 schrieb Daniel Vetter:
>>> On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky
>>> <Andrey.Grodzovsky@amd.com> wrote:
>>>>
>>>> On 11/17/20 2:49 PM, Daniel Vetter wrote:
>>>>> On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
>>>>>> On 11/17/20 1:52 PM, Daniel Vetter wrote:
>>>>>>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
>>>>>>>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
>>>>>>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky 
>>>>>>>>> wrote:
>>>>>>>>>> No point to try recovery if device is gone, just messes up 
>>>>>>>>>> things.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>> ---
>>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 
>>>>>>>>>> ++++++++++++++++
>>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++
>>>>>>>>>>      2 files changed, 24 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>> index 6932d75..5d6d3d9 100644
>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct 
>>>>>>>>>> pci_dev *pdev,
>>>>>>>>>>           return ret;
>>>>>>>>>>      }
>>>>>>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>>>>>>>>>> +{
>>>>>>>>>> +        int i;
>>>>>>>>>> +
>>>>>>>>>> +        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>>>>>> +                struct amdgpu_ring *ring = adev->rings[i];
>>>>>>>>>> +
>>>>>>>>>> +                if (!ring || !ring->sched.thread)
>>>>>>>>>> +                        continue;
>>>>>>>>>> +
>>>>>>>>>> + cancel_delayed_work_sync(&ring->sched.work_tdr);
>>>>>>>>>> +        }
>>>>>>>>>> +}
>>>>>>>>> I think this is a function that's supposed to be in 
>>>>>>>>> drm/scheduler, not
>>>>>>>>> here. Might also just be your cleanup code being ordered 
>>>>>>>>> wrongly, or your
>>>>>>>>> split in one of the earlier patches not done quite right.
>>>>>>>>> -Daniel
>>>>>>>> This function iterates across all the schedulers per amdgpu 
>>>>>>>> device and accesses
>>>>>>>> amdgpu specific structures , drm/scheduler deals with single 
>>>>>>>> scheduler at most
>>>>>>>> so looks to me like this is the right place for this function
>>>>>>> I guess we could keep track of all schedulers somewhere in a 
>>>>>>> list in
>>>>>>> struct drm_device and wrap this up. That was kinda the idea.
>>>>>>>
>>>>>>> Minimally I think a tiny wrapper with docs for the
>>>>>>> cancel_delayed_work_sync(&sched->work_tdr); which explains what 
>>>>>>> you must
>>>>>>> observe to make sure there's no race.
>>>>>> Will do
>>>>>>
>>>>>>
>>>>>>> I'm not exactly sure there's no
>>>>>>> guarantee here we won't get a new tdr work launched right 
>>>>>>> afterwards at
>>>>>>> least, so this looks a bit like a hack.
>>>>>> Note that for any TDR work happening post amdgpu_cancel_all_tdr
>>>>>> amdgpu_job_timedout->drm_dev_is_unplugged
>>>>>> will return true and so it will return early. To make it water 
>>>>>> proof tight
>>>>>> against race
>>>>>> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
>>>>> Hm that's confusing. You do a work_cancel_sync, so that at least 
>>>>> looks
>>>>> like "tdr work must not run after this point"
>>>>>
>>>>> If you only rely on drm_dev_enter/exit check with the tdr work, then
>>>>> there's no need to cancel anything.
>>>>
>>>> Agree, synchronize_srcu from drm_dev_unplug should play the role
>>>> of 'flushing' any earlier (in progress) tdr work which is
>>>> using drm_dev_enter/exit pair. Any later arising tdr will terminate 
>>>> early when
>>>> drm_dev_enter
>>>> returns false.
>>> Nope, anything you put into the work itself cannot close this race.
>>> It's the schedule_work that matters here. Or I'm missing something ...
>>> I thought that the tdr work you're cancelling here is launched by
>>> drm/scheduler code, not by the amd callback?
>
>
> My bad, you are right, I am supposed to put drm_dev_enter.exit pair 
> into drm_sched_job_timedout
>
>
>>
>> Yes that is correct. Canceling the work item is not the right 
>> approach at all, nor is adding dev_enter/exit pair in the recovery 
>> handler.
>
>
> Without adding the dev_enter/exit guarding pair in the recovery 
> handler you are ending up with GPU reset starting while
> the device is already unplugged, this leads to multiple errors and 
> general mess.
>
>
>>
>> What we need to do here is to stop the scheduler thread and then wait 
>> for any timeout handling to have finished.
>>
>> Otherwise it can scheduler a new timeout just after we have canceled 
>> this one.
>>
>> Regards,
>> Christian.
>
>
> Schedulers are stopped from amdgpu_driver_unload_kms which indeed 
> happens after drm_dev_unplug
> so yes, there is still a chance for new work being scheduler and 
> timeout armed after but, once i fix the code
> to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't 
> see why that not a good solution ?

Yeah that should work as well, but then you also don't need to cancel 
the work item from the driver.

> Any tdr work started after drm_dev_unplug finished will simply abort 
> on entry to drm_sched_job_timedout
> because drm_dev_enter will be false and the function will return 
> without rearming the timeout timer and
> so will have no impact.
>
> The only issue i see here now is of possible use after free if some 
> late tdr work will try to execute after
> drm device already gone, for this we probably should add 
> cancel_delayed_work_sync(sched.work_tdr)
> to drm_sched_fini after sched->thread is stopped there.

Good point, that is indeed missing as far as I can see.

Christian.

>
> Andrey

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 1/8] drm: Add dummy page per device or GEM object
  2020-11-16 20:42                         ` Andrey Grodzovsky
@ 2020-11-19 10:01                           ` Christian König
  0 siblings, 0 replies; 97+ messages in thread
From: Christian König @ 2020-11-19 10:01 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: Alex Deucher, Michel Dänzer, amd-gfx list, Pekka Paalanen,
	dri-devel

Am 16.11.20 um 21:42 schrieb Andrey Grodzovsky:
>
> On 11/16/20 3:36 PM, Christian König wrote:
>> Am 16.11.20 um 20:00 schrieb Andrey Grodzovsky:
>>>
>>> On 11/16/20 4:48 AM, Christian König wrote:
>>>> Am 15.11.20 um 07:34 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 11/14/20 4:51 AM, Daniel Vetter wrote:
>>>>>> On Sat, Nov 14, 2020 at 9:41 AM Christian König
>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>> Am 13.11.20 um 21:52 schrieb Andrey Grodzovsky:
>>>>>>>> On 6/22/20 1:50 PM, Daniel Vetter wrote:
>>>>>>>>> On Mon, Jun 22, 2020 at 7:45 PM Christian König
>>>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>>>> Am 22.06.20 um 16:32 schrieb Andrey Grodzovsky:
>>>>>>>>>>> On 6/22/20 9:18 AM, Christian König wrote:
>>>>>>>>>>>> Am 21.06.20 um 08:03 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>>> device is removed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c 
>>>>>>>>>>>>> b/drivers/gpu/drm/drm_file.c
>>>>>>>>>>>>> index c4c704e..67c0770 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>>>>>>> @@ -188,6 +188,12 @@ struct drm_file *drm_file_alloc(struct
>>>>>>>>>>>>> drm_minor *minor)
>>>>>>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>>>>>>         }
>>>>>>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | 
>>>>>>>>>>>>> __GFP_ZERO);
>>>>>>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>>>>>>> +        ret = -ENOMEM;
>>>>>>>>>>>>> +        goto out_prime_destroy;
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +
>>>>>>>>>>>>>         return file;
>>>>>>>>>>>>>       out_prime_destroy:
>>>>>>>>>>>>> @@ -284,6 +290,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>>>>>> dev->driver->postclose(dev, file);
>>>>>>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>>>>>>>> WARN_ON(!list_empty(&file->event_list));
>>>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>>>> b/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>>>> index 1de2cde..c482e9c 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct
>>>>>>>>>>>>> drm_device *dev,
>>>>>>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>>>>>>                 dma_buf, *handle);
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +    if (!ret) {
>>>>>>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | 
>>>>>>>>>>>>> __GFP_ZERO);
>>>>>>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>>>>>>> +            ret = -ENOMEM;
>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> +
>>>>>>>>>>>> While the per file case still looks acceptable this is a 
>>>>>>>>>>>> clear NAK
>>>>>>>>>>>> since it will massively increase the memory needed for a prime
>>>>>>>>>>>> exported object.
>>>>>>>>>>>>
>>>>>>>>>>>> I think that this is quite overkill in the first place and 
>>>>>>>>>>>> for the
>>>>>>>>>>>> hot unplug case we can just use the global dummy page as well.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>> Global dummy page is good for read access, what do you do on 
>>>>>>>>>>> write
>>>>>>>>>>> access ? My first approach was indeed to map at first global 
>>>>>>>>>>> dummy
>>>>>>>>>>> page as read only and mark the vma->vm_flags as !VM_SHARED 
>>>>>>>>>>> assuming
>>>>>>>>>>> that this would trigger Copy On Write flow in core mm
>>>>>>>>>>> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.7-rc7%2Fsource%2Fmm%2Fmemory.c%23L3977&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=kghiG3VpCJod6YefExoDVPl9X03zNhw3SN5GAxgbnmU%3D&amp;reserved=0) 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> on the next page fault to same address triggered by a write 
>>>>>>>>>>> access but
>>>>>>>>>>> then i realized a new COW page will be allocated for each 
>>>>>>>>>>> such mapping
>>>>>>>>>>> and this is much more wasteful then having a dedicated page 
>>>>>>>>>>> per GEM
>>>>>>>>>>> object.
>>>>>>>>>> Yeah, but this is only for a very very small corner cases. 
>>>>>>>>>> What we need
>>>>>>>>>> to prevent is increasing the memory usage during normal 
>>>>>>>>>> operation to
>>>>>>>>>> much.
>>>>>>>>>>
>>>>>>>>>> Using memory during the unplug is completely unproblematic 
>>>>>>>>>> because we
>>>>>>>>>> just released quite a bunch of it by releasing all those 
>>>>>>>>>> system memory
>>>>>>>>>> buffers.
>>>>>>>>>>
>>>>>>>>>> And I'm pretty sure that COWed pages are correctly accounted 
>>>>>>>>>> towards
>>>>>>>>>> the
>>>>>>>>>> used memory of a process.
>>>>>>>>>>
>>>>>>>>>> So I think if that approach works as intended and the COW 
>>>>>>>>>> pages are
>>>>>>>>>> released again on unmapping it would be the perfect solution 
>>>>>>>>>> to the
>>>>>>>>>> problem.
>>>>>>>>>>
>>>>>>>>>> Daniel what do you think?
>>>>>>>>> If COW works, sure sounds reasonable. And if we can make sure we
>>>>>>>>> managed to drop all the system allocations (otherwise suddenly 2x
>>>>>>>>> memory usage, worst case). But I have no idea whether we can
>>>>>>>>> retroshoehorn that into an established vma, you might have fun 
>>>>>>>>> stuff
>>>>>>>>> like a mkwrite handler there (which I thought is the COW handler
>>>>>>>>> thing, but really no idea).
>>>>>>>>>
>>>>>>>>> If we need to massively change stuff then I think rw dummy page,
>>>>>>>>> allocated on first fault after hotunplug (maybe just make it 
>>>>>>>>> one per
>>>>>>>>> object, that's simplest) seems like the much safer option. 
>>>>>>>>> Much less
>>>>>>>>> code that can go wrong.
>>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>> Regarding COW, i was looking into how to properly implement it 
>>>>>>>> from
>>>>>>>> within the fault handler (i.e. ttm_bo_vm_fault)
>>>>>>>> and the main obstacle I hit is that of exclusive access to the
>>>>>>>> vm_area_struct, i need to be able to modify
>>>>>>>> vma->vm_flags (and vm_page_prot)  to remove VM_SHARED bit so 
>>>>>>>> COW can
>>>>>>>> be triggered on subsequent write access
>>>>>>>> fault (here
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fmm%2Fmemory.c%23L4128&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ziHJtqyHuLrlb0uYKhoWCWhUAZnX0JquE%2BkBJ5Fx%2BNo%3D&amp;reserved=0) 
>>>>>>>>
>>>>>>>> but core mm takes only read side mm_sem (here for example
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fiommu%2Famd%2Fiommu_v2.c%23L488&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224016377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h360c75Upl3%2FW7im7M1%2BxY%2FXy4gxin%2BkCF1Ui2zFXMs%3D&amp;reserved=0) 
>>>>>>>>
>>>>>>>> and so I am not supposed to modify vm_area_struct in this case. 
>>>>>>>> I am
>>>>>>>> not sure if it's legit to write lock tthe mm_sem from this point.
>>>>>>>> I found some discussions about this here
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Flkml.iu.edu%2Fhypermail%2Flinux%2Fkernel%2F1909.1%2F02754.html&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C00053e9d983041ed63ae08d88882ed87%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637409443224021379%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=sx6s1lH%2FvxbIZajc4Yr49vFhxvPEnBHZlTt52D8qvZA%3D&amp;reserved=0 
>>>>>>>> but it
>>>>>>>> wasn't really clear to me
>>>>>>>> what's the solution.
>>>>>>>>
>>>>>>>> In any case, seems to me that easier and more memory saving 
>>>>>>>> solution
>>>>>>>> would be to just switch to per ttm bo dumy rw page that
>>>>>>>> would be allocated on demand as you suggested here. This should 
>>>>>>>> also
>>>>>>>> take care of imported BOs and flink cases.
>>>>>>>> Then i can drop the per device FD and per GEM object FD dummy 
>>>>>>>> BO and
>>>>>>>> the ugly loop i am using in patch 2 to match faulting BO to the 
>>>>>>>> right
>>>>>>>> dummy page.
>>>>>>>>
>>>>>>>> Does this makes sense ?
>>>>>>> I still don't see the information leak as much of a problem, but if
>>>>>>> Daniel insists we should probably do this.
>>>>>> Well amdgpu doesn't clear buffers by default, so indeed you guys 
>>>>>> are a
>>>>>> lot more laissez-faire here. But in general we really don't do that
>>>>>> kind of leaking. Iirc there's even radeonsi bugs because else 
>>>>>> clears,
>>>>>> and radeonsi happily displays gunk :-)
>>>>>>
>>>>>>> But could we at least have only one page per client instead of 
>>>>>>> per BO?
>>>>>> I think you can do one page per file descriptor or something like
>>>>>> that. But gets annoying with shared bo, especially with dma_buf_mmap
>>>>>> forwarding.
>>>>>> -Daniel
>>>>>
>>>>>
>>>>> Christian - is your concern more with too much page allocations or 
>>>>> with extra pointer member
>>>>> cluttering TTM BO struct ?
>>>>
>>>> Yes, that is one problem.
>>>>
>>>>> Because we can allocate the dummy page on demand only when
>>>>> needed. It's just seems to me that keeping it per BO streamlines 
>>>>> the code as I don't need to
>>>>> have different handling for local vs imported BOs.
>>>>
>>>> Why should you have a difference between local vs imported BOs?
>>>
>>>
>>> For local BO seems like Daniel's suggestion to use 
>>> vm_area_struct->vm_file->private_data
>>> should work as this points to drm_file. For imported BOs 
>>> private_data will point to dma_buf structure
>>> since each imported BO is backed by a pseudo file (created in 
>>> dma_buf_getfile).
>>
>> Oh, good point. But we could easily fix that now. That should make 
>> the mapping code less complex as well.
>
>
> Can you clarify what fix u have in mind ? I assume it's not by 
> altering file->private_data to point
> to something else as we need to retrieve dmabuf (e.g. 
> dma_buf_mmap_internal)

Ah, crap. You are right that is really tricky because vma->vm_file 
doesn't point to something useful in this situation.

I was talking about the new vma_set_file() function I've just pushed to 
drm-misc-next, but that stuff can't be used here.

I still don't see the need to use more than the global dummy page even 
if that means information leak between processes on unplug.

Christian.

>
> Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>>> If so,where should we store the dummy RW BO in this case ? In 
>>> current implementation  it's stored in drm_gem_object.
>>>
>>> P.S For FLINK case it seems to me the handling should be no 
>>> different then with local BO as the
>>> FD used for mmap in this case is still the same one associated with 
>>> the DRM file.
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-19  7:55                     ` Christian König
@ 2020-11-19 15:02                       ` Andrey Grodzovsky
  2020-11-19 15:29                         ` Daniel Vetter
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-19 15:02 UTC (permalink / raw)
  To: christian.koenig, Daniel Vetter
  Cc: Alex Deucher, Michel Dänzer, amd-gfx list, Pekka Paalanen,
	dri-devel


On 11/19/20 2:55 AM, Christian König wrote:
> Am 18.11.20 um 17:20 schrieb Andrey Grodzovsky:
>>
>> On 11/18/20 7:01 AM, Christian König wrote:
>>> Am 18.11.20 um 08:39 schrieb Daniel Vetter:
>>>> On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky
>>>> <Andrey.Grodzovsky@amd.com> wrote:
>>>>>
>>>>> On 11/17/20 2:49 PM, Daniel Vetter wrote:
>>>>>> On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
>>>>>>> On 11/17/20 1:52 PM, Daniel Vetter wrote:
>>>>>>>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
>>>>>>>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
>>>>>>>>>>> No point to try recovery if device is gone, just messes up things.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>>> ---
>>>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>>>>>>>>>>>      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++
>>>>>>>>>>>      2 files changed, 24 insertions(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>> index 6932d75..5d6d3d9 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>> @@ -1129,12 +1129,28 @@ static int amdgpu_pci_probe(struct pci_dev 
>>>>>>>>>>> *pdev,
>>>>>>>>>>>           return ret;
>>>>>>>>>>>      }
>>>>>>>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>>>>>>>>>>> +{
>>>>>>>>>>> +        int i;
>>>>>>>>>>> +
>>>>>>>>>>> +        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>>>>>>> +                struct amdgpu_ring *ring = adev->rings[i];
>>>>>>>>>>> +
>>>>>>>>>>> +                if (!ring || !ring->sched.thread)
>>>>>>>>>>> +                        continue;
>>>>>>>>>>> +
>>>>>>>>>>> + cancel_delayed_work_sync(&ring->sched.work_tdr);
>>>>>>>>>>> +        }
>>>>>>>>>>> +}
>>>>>>>>>> I think this is a function that's supposed to be in drm/scheduler, not
>>>>>>>>>> here. Might also just be your cleanup code being ordered wrongly, or 
>>>>>>>>>> your
>>>>>>>>>> split in one of the earlier patches not done quite right.
>>>>>>>>>> -Daniel
>>>>>>>>> This function iterates across all the schedulers per amdgpu device and 
>>>>>>>>> accesses
>>>>>>>>> amdgpu specific structures , drm/scheduler deals with single scheduler 
>>>>>>>>> at most
>>>>>>>>> so looks to me like this is the right place for this function
>>>>>>>> I guess we could keep track of all schedulers somewhere in a list in
>>>>>>>> struct drm_device and wrap this up. That was kinda the idea.
>>>>>>>>
>>>>>>>> Minimally I think a tiny wrapper with docs for the
>>>>>>>> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
>>>>>>>> observe to make sure there's no race.
>>>>>>> Will do
>>>>>>>
>>>>>>>
>>>>>>>> I'm not exactly sure there's no
>>>>>>>> guarantee here we won't get a new tdr work launched right afterwards at
>>>>>>>> least, so this looks a bit like a hack.
>>>>>>> Note that for any TDR work happening post amdgpu_cancel_all_tdr
>>>>>>> amdgpu_job_timedout->drm_dev_is_unplugged
>>>>>>> will return true and so it will return early. To make it water proof tight
>>>>>>> against race
>>>>>>> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
>>>>>> Hm that's confusing. You do a work_cancel_sync, so that at least looks
>>>>>> like "tdr work must not run after this point"
>>>>>>
>>>>>> If you only rely on drm_dev_enter/exit check with the tdr work, then
>>>>>> there's no need to cancel anything.
>>>>>
>>>>> Agree, synchronize_srcu from drm_dev_unplug should play the role
>>>>> of 'flushing' any earlier (in progress) tdr work which is
>>>>> using drm_dev_enter/exit pair. Any later arising tdr will terminate early 
>>>>> when
>>>>> drm_dev_enter
>>>>> returns false.
>>>> Nope, anything you put into the work itself cannot close this race.
>>>> It's the schedule_work that matters here. Or I'm missing something ...
>>>> I thought that the tdr work you're cancelling here is launched by
>>>> drm/scheduler code, not by the amd callback?
>>
>>
>> My bad, you are right, I am supposed to put drm_dev_enter.exit pair into 
>> drm_sched_job_timedout
>>
>>
>>>
>>> Yes that is correct. Canceling the work item is not the right approach at 
>>> all, nor is adding dev_enter/exit pair in the recovery handler.
>>
>>
>> Without adding the dev_enter/exit guarding pair in the recovery handler you 
>> are ending up with GPU reset starting while
>> the device is already unplugged, this leads to multiple errors and general mess.
>>
>>
>>>
>>> What we need to do here is to stop the scheduler thread and then wait for 
>>> any timeout handling to have finished.
>>>
>>> Otherwise it can scheduler a new timeout just after we have canceled this one.
>>>
>>> Regards,
>>> Christian.
>>
>>
>> Schedulers are stopped from amdgpu_driver_unload_kms which indeed happens 
>> after drm_dev_unplug
>> so yes, there is still a chance for new work being scheduler and timeout 
>> armed after but, once i fix the code
>> to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't see why 
>> that not a good solution ?
>
> Yeah that should work as well, but then you also don't need to cancel the work 
> item from the driver.


Indeed, as Daniel pointed out no need and I dropped it. One correction - I 
previously said that w/o
dev_enter/exit guarding pair in scheduler's TO handler you will get GPU reset 
starting while device already gone -
of course this is not fully preventing this as the device can be extracted at 
any moment just after we
already entered GPU recovery. But it does saves us processing a futile  GPU 
recovery which always
starts once you unplug the device if there are active gobs in progress at the 
moment and so I think it's
still justifiable to keep the dev_enter/exit guarding pair there.

Andrey


>
>
>> Any tdr work started after drm_dev_unplug finished will simply abort on entry 
>> to drm_sched_job_timedout
>> because drm_dev_enter will be false and the function will return without 
>> rearming the timeout timer and
>> so will have no impact.
>>
>> The only issue i see here now is of possible use after free if some late tdr 
>> work will try to execute after
>> drm device already gone, for this we probably should add 
>> cancel_delayed_work_sync(sched.work_tdr)
>> to drm_sched_fini after sched->thread is stopped there.
>
> Good point, that is indeed missing as far as I can see.
>
> Christian.
>
>>
>> Andrey
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-19 15:02                       ` Andrey Grodzovsky
@ 2020-11-19 15:29                         ` Daniel Vetter
  2020-11-19 21:24                           ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Daniel Vetter @ 2020-11-19 15:29 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Michel Dänzer, dri-devel, Pekka Paalanen, amd-gfx list,
	Daniel Vetter, Alex Deucher, christian.koenig

On Thu, Nov 19, 2020 at 10:02:28AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/19/20 2:55 AM, Christian König wrote:
> > Am 18.11.20 um 17:20 schrieb Andrey Grodzovsky:
> > > 
> > > On 11/18/20 7:01 AM, Christian König wrote:
> > > > Am 18.11.20 um 08:39 schrieb Daniel Vetter:
> > > > > On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky
> > > > > <Andrey.Grodzovsky@amd.com> wrote:
> > > > > > 
> > > > > > On 11/17/20 2:49 PM, Daniel Vetter wrote:
> > > > > > > On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
> > > > > > > > On 11/17/20 1:52 PM, Daniel Vetter wrote:
> > > > > > > > > On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
> > > > > > > > > > On 6/22/20 5:53 AM, Daniel Vetter wrote:
> > > > > > > > > > > On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
> > > > > > > > > > > > No point to try recovery if device is gone, just messes up things.
> > > > > > > > > > > > 
> > > > > > > > > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > > > > > > > > ---
> > > > > > > > > > > >      drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
> > > > > > > > > > > >      drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++
> > > > > > > > > > > >      2 files changed, 24 insertions(+)
> > > > > > > > > > > > 
> > > > > > > > > > > > diff --git
> > > > > > > > > > > > a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > > > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > > > > > > > index 6932d75..5d6d3d9 100644
> > > > > > > > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > > > > > > > > > > @@ -1129,12 +1129,28 @@ static
> > > > > > > > > > > > int amdgpu_pci_probe(struct
> > > > > > > > > > > > pci_dev *pdev,
> > > > > > > > > > > >           return ret;
> > > > > > > > > > > >      }
> > > > > > > > > > > > +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +        int i;
> > > > > > > > > > > > +
> > > > > > > > > > > > +        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> > > > > > > > > > > > +                struct amdgpu_ring *ring = adev->rings[i];
> > > > > > > > > > > > +
> > > > > > > > > > > > +                if (!ring || !ring->sched.thread)
> > > > > > > > > > > > +                        continue;
> > > > > > > > > > > > +
> > > > > > > > > > > > + cancel_delayed_work_sync(&ring->sched.work_tdr);
> > > > > > > > > > > > +        }
> > > > > > > > > > > > +}
> > > > > > > > > > > I think this is a function that's supposed to be in drm/scheduler, not
> > > > > > > > > > > here. Might also just be your
> > > > > > > > > > > cleanup code being ordered wrongly,
> > > > > > > > > > > or your
> > > > > > > > > > > split in one of the earlier patches not done quite right.
> > > > > > > > > > > -Daniel
> > > > > > > > > > This function iterates across all the
> > > > > > > > > > schedulers per amdgpu device and
> > > > > > > > > > accesses
> > > > > > > > > > amdgpu specific structures ,
> > > > > > > > > > drm/scheduler deals with single
> > > > > > > > > > scheduler at most
> > > > > > > > > > so looks to me like this is the right place for this function
> > > > > > > > > I guess we could keep track of all schedulers somewhere in a list in
> > > > > > > > > struct drm_device and wrap this up. That was kinda the idea.
> > > > > > > > > 
> > > > > > > > > Minimally I think a tiny wrapper with docs for the
> > > > > > > > > cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
> > > > > > > > > observe to make sure there's no race.
> > > > > > > > Will do
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > I'm not exactly sure there's no
> > > > > > > > > guarantee here we won't get a new tdr work launched right afterwards at
> > > > > > > > > least, so this looks a bit like a hack.
> > > > > > > > Note that for any TDR work happening post amdgpu_cancel_all_tdr
> > > > > > > > amdgpu_job_timedout->drm_dev_is_unplugged
> > > > > > > > will return true and so it will return early. To make it water proof tight
> > > > > > > > against race
> > > > > > > > i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
> > > > > > > Hm that's confusing. You do a work_cancel_sync, so that at least looks
> > > > > > > like "tdr work must not run after this point"
> > > > > > > 
> > > > > > > If you only rely on drm_dev_enter/exit check with the tdr work, then
> > > > > > > there's no need to cancel anything.
> > > > > > 
> > > > > > Agree, synchronize_srcu from drm_dev_unplug should play the role
> > > > > > of 'flushing' any earlier (in progress) tdr work which is
> > > > > > using drm_dev_enter/exit pair. Any later arising tdr
> > > > > > will terminate early when
> > > > > > drm_dev_enter
> > > > > > returns false.
> > > > > Nope, anything you put into the work itself cannot close this race.
> > > > > It's the schedule_work that matters here. Or I'm missing something ...
> > > > > I thought that the tdr work you're cancelling here is launched by
> > > > > drm/scheduler code, not by the amd callback?
> > > 
> > > 
> > > My bad, you are right, I am supposed to put drm_dev_enter.exit pair
> > > into drm_sched_job_timedout
> > > 
> > > 
> > > > 
> > > > Yes that is correct. Canceling the work item is not the right
> > > > approach at all, nor is adding dev_enter/exit pair in the
> > > > recovery handler.
> > > 
> > > 
> > > Without adding the dev_enter/exit guarding pair in the recovery
> > > handler you are ending up with GPU reset starting while
> > > the device is already unplugged, this leads to multiple errors and general mess.
> > > 
> > > 
> > > > 
> > > > What we need to do here is to stop the scheduler thread and then
> > > > wait for any timeout handling to have finished.
> > > > 
> > > > Otherwise it can scheduler a new timeout just after we have canceled this one.
> > > > 
> > > > Regards,
> > > > Christian.
> > > 
> > > 
> > > Schedulers are stopped from amdgpu_driver_unload_kms which indeed
> > > happens after drm_dev_unplug
> > > so yes, there is still a chance for new work being scheduler and
> > > timeout armed after but, once i fix the code
> > > to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't
> > > see why that not a good solution ?
> > 
> > Yeah that should work as well, but then you also don't need to cancel
> > the work item from the driver.
> 
> 
> Indeed, as Daniel pointed out no need and I dropped it. One correction - I
> previously said that w/o
> dev_enter/exit guarding pair in scheduler's TO handler you will get GPU
> reset starting while device already gone -
> of course this is not fully preventing this as the device can be extracted
> at any moment just after we
> already entered GPU recovery. But it does saves us processing a futile  GPU
> recovery which always
> starts once you unplug the device if there are active gobs in progress at
> the moment and so I think it's
> still justifiable to keep the dev_enter/exit guarding pair there.

Yeah sprinkling drm_dev_enter/exit over the usual suspect code paths like
tdr to make the entire unloading much faster makes sense. Waiting for
enormous amounts of mmio ops to time out isn't fun. A comment might be
good for that though, to explain why we're doing that.
-Daniel

> 
> Andrey
> 
> 
> > 
> > 
> > > Any tdr work started after drm_dev_unplug finished will simply abort
> > > on entry to drm_sched_job_timedout
> > > because drm_dev_enter will be false and the function will return
> > > without rearming the timeout timer and
> > > so will have no impact.
> > > 
> > > The only issue i see here now is of possible use after free if some
> > > late tdr work will try to execute after
> > > drm device already gone, for this we probably should add
> > > cancel_delayed_work_sync(sched.work_tdr)
> > > to drm_sched_fini after sched->thread is stopped there.
> > 
> > Good point, that is indeed missing as far as I can see.
> > 
> > Christian.
> > 
> > > 
> > > Andrey
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged.
  2020-11-19 15:29                         ` Daniel Vetter
@ 2020-11-19 21:24                           ` Andrey Grodzovsky
  0 siblings, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-11-19 21:24 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Michel Dänzer, amd-gfx list, Pekka Paalanen, dri-devel,
	Alex Deucher, christian.koenig


On 11/19/20 10:29 AM, Daniel Vetter wrote:
> On Thu, Nov 19, 2020 at 10:02:28AM -0500, Andrey Grodzovsky wrote:
>> On 11/19/20 2:55 AM, Christian König wrote:
>>> Am 18.11.20 um 17:20 schrieb Andrey Grodzovsky:
>>>> On 11/18/20 7:01 AM, Christian König wrote:
>>>>> Am 18.11.20 um 08:39 schrieb Daniel Vetter:
>>>>>> On Tue, Nov 17, 2020 at 9:07 PM Andrey Grodzovsky
>>>>>> <Andrey.Grodzovsky@amd.com> wrote:
>>>>>>> On 11/17/20 2:49 PM, Daniel Vetter wrote:
>>>>>>>> On Tue, Nov 17, 2020 at 02:18:49PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>> On 11/17/20 1:52 PM, Daniel Vetter wrote:
>>>>>>>>>> On Tue, Nov 17, 2020 at 01:38:14PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>>>> On 6/22/20 5:53 AM, Daniel Vetter wrote:
>>>>>>>>>>>> On Sun, Jun 21, 2020 at 02:03:08AM -0400, Andrey Grodzovsky wrote:
>>>>>>>>>>>>> No point to try recovery if device is gone, just messes up things.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 16 ++++++++++++++++
>>>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 8 ++++++++
>>>>>>>>>>>>>       2 files changed, 24 insertions(+)
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>>>> index 6932d75..5d6d3d9 100644
>>>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>>>>>>>>>>> @@ -1129,12 +1129,28 @@ static
>>>>>>>>>>>>> int amdgpu_pci_probe(struct
>>>>>>>>>>>>> pci_dev *pdev,
>>>>>>>>>>>>>            return ret;
>>>>>>>>>>>>>       }
>>>>>>>>>>>>> +static void amdgpu_cancel_all_tdr(struct amdgpu_device *adev)
>>>>>>>>>>>>> +{
>>>>>>>>>>>>> +        int i;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>>>>>>>>>>>> +                struct amdgpu_ring *ring = adev->rings[i];
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +                if (!ring || !ring->sched.thread)
>>>>>>>>>>>>> +                        continue;
>>>>>>>>>>>>> +
>>>>>>>>>>>>> + cancel_delayed_work_sync(&ring->sched.work_tdr);
>>>>>>>>>>>>> +        }
>>>>>>>>>>>>> +}
>>>>>>>>>>>> I think this is a function that's supposed to be in drm/scheduler, not
>>>>>>>>>>>> here. Might also just be your
>>>>>>>>>>>> cleanup code being ordered wrongly,
>>>>>>>>>>>> or your
>>>>>>>>>>>> split in one of the earlier patches not done quite right.
>>>>>>>>>>>> -Daniel
>>>>>>>>>>> This function iterates across all the
>>>>>>>>>>> schedulers per amdgpu device and
>>>>>>>>>>> accesses
>>>>>>>>>>> amdgpu specific structures ,
>>>>>>>>>>> drm/scheduler deals with single
>>>>>>>>>>> scheduler at most
>>>>>>>>>>> so looks to me like this is the right place for this function
>>>>>>>>>> I guess we could keep track of all schedulers somewhere in a list in
>>>>>>>>>> struct drm_device and wrap this up. That was kinda the idea.
>>>>>>>>>>
>>>>>>>>>> Minimally I think a tiny wrapper with docs for the
>>>>>>>>>> cancel_delayed_work_sync(&sched->work_tdr); which explains what you must
>>>>>>>>>> observe to make sure there's no race.
>>>>>>>>> Will do
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I'm not exactly sure there's no
>>>>>>>>>> guarantee here we won't get a new tdr work launched right afterwards at
>>>>>>>>>> least, so this looks a bit like a hack.
>>>>>>>>> Note that for any TDR work happening post amdgpu_cancel_all_tdr
>>>>>>>>> amdgpu_job_timedout->drm_dev_is_unplugged
>>>>>>>>> will return true and so it will return early. To make it water proof tight
>>>>>>>>> against race
>>>>>>>>> i can switch from drm_dev_is_unplugged to drm_dev_enter/exit
>>>>>>>> Hm that's confusing. You do a work_cancel_sync, so that at least looks
>>>>>>>> like "tdr work must not run after this point"
>>>>>>>>
>>>>>>>> If you only rely on drm_dev_enter/exit check with the tdr work, then
>>>>>>>> there's no need to cancel anything.
>>>>>>> Agree, synchronize_srcu from drm_dev_unplug should play the role
>>>>>>> of 'flushing' any earlier (in progress) tdr work which is
>>>>>>> using drm_dev_enter/exit pair. Any later arising tdr
>>>>>>> will terminate early when
>>>>>>> drm_dev_enter
>>>>>>> returns false.
>>>>>> Nope, anything you put into the work itself cannot close this race.
>>>>>> It's the schedule_work that matters here. Or I'm missing something ...
>>>>>> I thought that the tdr work you're cancelling here is launched by
>>>>>> drm/scheduler code, not by the amd callback?
>>>>
>>>> My bad, you are right, I am supposed to put drm_dev_enter.exit pair
>>>> into drm_sched_job_timedout
>>>>
>>>>
>>>>> Yes that is correct. Canceling the work item is not the right
>>>>> approach at all, nor is adding dev_enter/exit pair in the
>>>>> recovery handler.
>>>>
>>>> Without adding the dev_enter/exit guarding pair in the recovery
>>>> handler you are ending up with GPU reset starting while
>>>> the device is already unplugged, this leads to multiple errors and general mess.
>>>>
>>>>
>>>>> What we need to do here is to stop the scheduler thread and then
>>>>> wait for any timeout handling to have finished.
>>>>>
>>>>> Otherwise it can scheduler a new timeout just after we have canceled this one.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>
>>>> Schedulers are stopped from amdgpu_driver_unload_kms which indeed
>>>> happens after drm_dev_unplug
>>>> so yes, there is still a chance for new work being scheduler and
>>>> timeout armed after but, once i fix the code
>>>> to place drm_dev_enter/exit pair into drm_sched_job_timeout I don't
>>>> see why that not a good solution ?
>>> Yeah that should work as well, but then you also don't need to cancel
>>> the work item from the driver.
>>
>> Indeed, as Daniel pointed out no need and I dropped it. One correction - I
>> previously said that w/o
>> dev_enter/exit guarding pair in scheduler's TO handler you will get GPU
>> reset starting while device already gone -
>> of course this is not fully preventing this as the device can be extracted
>> at any moment just after we
>> already entered GPU recovery. But it does saves us processing a futile  GPU
>> recovery which always
>> starts once you unplug the device if there are active gobs in progress at
>> the moment and so I think it's
>> still justifiable to keep the dev_enter/exit guarding pair there.
> Yeah sprinkling drm_dev_enter/exit over the usual suspect code paths like
> tdr to make the entire unloading much faster makes sense. Waiting for
> enormous amounts of mmio ops to time out isn't fun. A comment might be
> good for that though, to explain why we're doing that.
> -Daniel


Will do, I also tried to insert drm_dev_enter/exit in all MMIO accessors in amdgpu
to try and avoid at that level but didn't get good results for unclear reason, 
will probably get
to this as a follow up work to again avoid expanding the scope of current work 
too much.

Andrey


>
>> Andrey
>>
>>
>>>
>>>> Any tdr work started after drm_dev_unplug finished will simply abort
>>>> on entry to drm_sched_job_timedout
>>>> because drm_dev_enter will be false and the function will return
>>>> without rearming the timeout timer and
>>>> so will have no impact.
>>>>
>>>> The only issue i see here now is of possible use after free if some
>>>> late tdr work will try to execute after
>>>> drm device already gone, for this we probably should add
>>>> cancel_delayed_work_sync(sched.work_tdr)
>>>> to drm_sched_fini after sched->thread is stopped there.
>>> Good point, that is indeed missing as far as I can see.
>>>
>>> Christian.
>>>
>>>> Andrey
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-11-11 15:34                         ` Greg KH
  2020-11-11 15:45                           ` Andrey Grodzovsky
@ 2020-12-02 15:48                           ` Andrey Grodzovsky
  2020-12-02 17:34                             ` Greg KH
  1 sibling, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-12-02 15:48 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


On 11/11/20 10:34 AM, Greg KH wrote:
> On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
>> On 11/10/20 12:59 PM, Greg KH wrote:
>>> On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
>>>> Hi, back to this after a long context switch for some higher priority stuff.
>>>>
>>>> So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9fbfecac94a340dfb68408d886571609%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637407055896651058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Ye8HJR1vidppcOBnlOgVu5GwKD2%2Bb5ztHbiI%2BubKKT0%3D&amp;reserved=0
>>>> was enough for me. Seems like while device_remove_file can handle the use
>>>> case where the file and the parent directory already gone,
>>>> sysfs_remove_group goes down in flames in that case
>>>> due to kobj->sd being unset on device removal.
>>> A driver shouldn't ever have to remove individual sysfs groups, the
>>> driver core/bus logic should do it for them automatically.
>>>
>>> And whenever a driver calls a sysfs_* call, that's a hint that something
>>> is not working properly.
>>
>>
>> Do you mean that while the driver creates the groups and files explicitly
>> from it's different subsystems it should not explicitly remove each
>> one of them because all of them should be removed at once (and
>> recursively) when the device is being removed ?
> Individual drivers should never add groups/files in sysfs, the driver
> core should do it properly for you if you have everything set up
> properly.  And yes, the driver core will automatically remove them as
> well.
>
> Please use the default groups attribute for your bus/subsystem and this
> will happen automagically.


Hi Greg, I tried your suggestion to hang amdgpu's sysfs
attributes on default attributes in struct device.groups but turns out it's not 
usable since by the
time i have access to struct device from amdgpu code it has already been 
initialized by pci core
(i.e.  past the point where device_add->device_add_attrs->device_add_groups with 
dev->groups is called)
and so i can't really use it.

What I can only think of using is creating my own struct attribute_group ** 
array in amdgpu where I aggregate all
amdgpu sysfs attributes, call device_add_groups in the end of amgpu pci probe 
with that array and on device remove call
device_remove_groups with the same array.

Do you maybe have a better suggestion for me ?

Andrey


>
> thanks,
>
> greg k-h
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-12-02 15:48                           ` Andrey Grodzovsky
@ 2020-12-02 17:34                             ` Greg KH
  2020-12-02 18:02                               ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Greg KH @ 2020-12-02 17:34 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/11/20 10:34 AM, Greg KH wrote:
> > On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
> > > On 11/10/20 12:59 PM, Greg KH wrote:
> > > > On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> > > > > Hi, back to this after a long context switch for some higher priority stuff.
> > > > > 
> > > > > So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9fbfecac94a340dfb68408d886571609%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637407055896651058%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Ye8HJR1vidppcOBnlOgVu5GwKD2%2Bb5ztHbiI%2BubKKT0%3D&amp;reserved=0
> > > > > was enough for me. Seems like while device_remove_file can handle the use
> > > > > case where the file and the parent directory already gone,
> > > > > sysfs_remove_group goes down in flames in that case
> > > > > due to kobj->sd being unset on device removal.
> > > > A driver shouldn't ever have to remove individual sysfs groups, the
> > > > driver core/bus logic should do it for them automatically.
> > > > 
> > > > And whenever a driver calls a sysfs_* call, that's a hint that something
> > > > is not working properly.
> > > 
> > > 
> > > Do you mean that while the driver creates the groups and files explicitly
> > > from it's different subsystems it should not explicitly remove each
> > > one of them because all of them should be removed at once (and
> > > recursively) when the device is being removed ?
> > Individual drivers should never add groups/files in sysfs, the driver
> > core should do it properly for you if you have everything set up
> > properly.  And yes, the driver core will automatically remove them as
> > well.
> > 
> > Please use the default groups attribute for your bus/subsystem and this
> > will happen automagically.
> 
> 
> Hi Greg, I tried your suggestion to hang amdgpu's sysfs
> attributes on default attributes in struct device.groups but turns out it's
> not usable since by the
> time i have access to struct device from amdgpu code it has already been
> initialized by pci core
> (i.e.  past the point where device_add->device_add_attrs->device_add_groups
> with dev->groups is called)
> and so i can't really use it.

That's odd, why can't you just set the groups pointer in your pci_driver
structure?  That's what it is there for, right?

> What I can only think of using is creating my own struct attribute_group **
> array in amdgpu where I aggregate all
> amdgpu sysfs attributes, call device_add_groups in the end of amgpu pci
> probe with that array and on device remove call
> device_remove_groups with the same array.

Horrid, no, see above :)

thanks,

greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-12-02 17:34                             ` Greg KH
@ 2020-12-02 18:02                               ` Andrey Grodzovsky
  2020-12-02 18:20                                 ` Greg KH
  0 siblings, 1 reply; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-12-02 18:02 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


On 12/2/20 12:34 PM, Greg KH wrote:
> On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
>> On 11/11/20 10:34 AM, Greg KH wrote:
>>> On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
>>>> On 11/10/20 12:59 PM, Greg KH wrote:
>>>>> On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
>>>>>> Hi, back to this after a long context switch for some higher priority stuff.
>>>>>>
>>>>>> So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C29ff7efb89bd47d8488708d896e86e7c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637425272317529134%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Vzc3fVofA6%2BMPSqHmBqcWavQLKWU1%2FXKJFun24irLf0%3D&amp;reserved=0
>>>>>> was enough for me. Seems like while device_remove_file can handle the use
>>>>>> case where the file and the parent directory already gone,
>>>>>> sysfs_remove_group goes down in flames in that case
>>>>>> due to kobj->sd being unset on device removal.
>>>>> A driver shouldn't ever have to remove individual sysfs groups, the
>>>>> driver core/bus logic should do it for them automatically.
>>>>>
>>>>> And whenever a driver calls a sysfs_* call, that's a hint that something
>>>>> is not working properly.
>>>>
>>>> Do you mean that while the driver creates the groups and files explicitly
>>>> from it's different subsystems it should not explicitly remove each
>>>> one of them because all of them should be removed at once (and
>>>> recursively) when the device is being removed ?
>>> Individual drivers should never add groups/files in sysfs, the driver
>>> core should do it properly for you if you have everything set up
>>> properly.  And yes, the driver core will automatically remove them as
>>> well.
>>>
>>> Please use the default groups attribute for your bus/subsystem and this
>>> will happen automagically.
>>
>> Hi Greg, I tried your suggestion to hang amdgpu's sysfs
>> attributes on default attributes in struct device.groups but turns out it's
>> not usable since by the
>> time i have access to struct device from amdgpu code it has already been
>> initialized by pci core
>> (i.e.  past the point where device_add->device_add_attrs->device_add_groups
>> with dev->groups is called)
>> and so i can't really use it.
> That's odd, why can't you just set the groups pointer in your pci_driver
> structure?  That's what it is there for, right?

I am probably missing something but amdgpu sysfs attrs are per device not per 
driver
and their life cycle is bound to the device and their location in the sysfs 
topology is
under each device. Putting them as driver default attr will not put them in 
their current per device location
and won't make them automatically be destroyed once a particular device goes 
away, no ?

Andrey


>
>> What I can only think of using is creating my own struct attribute_group **
>> array in amdgpu where I aggregate all
>> amdgpu sysfs attributes, call device_add_groups in the end of amgpu pci
>> probe with that array and on device remove call
>> device_remove_groups with the same array.
> Horrid, no, see above :)
>
> thanks,
>
> greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-12-02 18:02                               ` Andrey Grodzovsky
@ 2020-12-02 18:20                                 ` Greg KH
  2020-12-02 18:40                                   ` Andrey Grodzovsky
  0 siblings, 1 reply; 97+ messages in thread
From: Greg KH @ 2020-12-02 18:20 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher

On Wed, Dec 02, 2020 at 01:02:06PM -0500, Andrey Grodzovsky wrote:
> 
> On 12/2/20 12:34 PM, Greg KH wrote:
> > On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
> > > On 11/11/20 10:34 AM, Greg KH wrote:
> > > > On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
> > > > > On 11/10/20 12:59 PM, Greg KH wrote:
> > > > > > On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
> > > > > > > Hi, back to this after a long context switch for some higher priority stuff.
> > > > > > > 
> > > > > > > So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C29ff7efb89bd47d8488708d896e86e7c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637425272317529134%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Vzc3fVofA6%2BMPSqHmBqcWavQLKWU1%2FXKJFun24irLf0%3D&amp;reserved=0
> > > > > > > was enough for me. Seems like while device_remove_file can handle the use
> > > > > > > case where the file and the parent directory already gone,
> > > > > > > sysfs_remove_group goes down in flames in that case
> > > > > > > due to kobj->sd being unset on device removal.
> > > > > > A driver shouldn't ever have to remove individual sysfs groups, the
> > > > > > driver core/bus logic should do it for them automatically.
> > > > > > 
> > > > > > And whenever a driver calls a sysfs_* call, that's a hint that something
> > > > > > is not working properly.
> > > > > 
> > > > > Do you mean that while the driver creates the groups and files explicitly
> > > > > from it's different subsystems it should not explicitly remove each
> > > > > one of them because all of them should be removed at once (and
> > > > > recursively) when the device is being removed ?
> > > > Individual drivers should never add groups/files in sysfs, the driver
> > > > core should do it properly for you if you have everything set up
> > > > properly.  And yes, the driver core will automatically remove them as
> > > > well.
> > > > 
> > > > Please use the default groups attribute for your bus/subsystem and this
> > > > will happen automagically.
> > > 
> > > Hi Greg, I tried your suggestion to hang amdgpu's sysfs
> > > attributes on default attributes in struct device.groups but turns out it's
> > > not usable since by the
> > > time i have access to struct device from amdgpu code it has already been
> > > initialized by pci core
> > > (i.e.  past the point where device_add->device_add_attrs->device_add_groups
> > > with dev->groups is called)
> > > and so i can't really use it.
> > That's odd, why can't you just set the groups pointer in your pci_driver
> > structure?  That's what it is there for, right?
> 
> I am probably missing something but amdgpu sysfs attrs are per device not
> per driver

Oops, you are right, you want the 'dev_groups' field.  Looks like pci
doesn't export that directly, so you can do:
	.driver {
		.dev_groups = my_device_groups;
	},
in your pci_driver structure.

Or I'm sure the PCI driver maintainer would take a patch like
7d9c1d2f7aca ("USB: add support for dev_groups to struct
usb_device_driver") was done for the USB subsystem, as diving into the
"raw" .driver pointer isn't really that clean or nice in my opinion.

thanks,

greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal
  2020-12-02 18:20                                 ` Greg KH
@ 2020-12-02 18:40                                   ` Andrey Grodzovsky
  0 siblings, 0 replies; 97+ messages in thread
From: Andrey Grodzovsky @ 2020-12-02 18:40 UTC (permalink / raw)
  To: Greg KH
  Cc: daniel.vetter, michel, dri-devel, ppaalanen, amd-gfx,
	Daniel Vetter, ckoenig.leichtzumerken, alexdeucher


On 12/2/20 1:20 PM, Greg KH wrote:
> On Wed, Dec 02, 2020 at 01:02:06PM -0500, Andrey Grodzovsky wrote:
>> On 12/2/20 12:34 PM, Greg KH wrote:
>>> On Wed, Dec 02, 2020 at 10:48:01AM -0500, Andrey Grodzovsky wrote:
>>>> On 11/11/20 10:34 AM, Greg KH wrote:
>>>>> On Wed, Nov 11, 2020 at 10:13:13AM -0500, Andrey Grodzovsky wrote:
>>>>>> On 11/10/20 12:59 PM, Greg KH wrote:
>>>>>>> On Tue, Nov 10, 2020 at 12:54:21PM -0500, Andrey Grodzovsky wrote:
>>>>>>>> Hi, back to this after a long context switch for some higher priority stuff.
>>>>>>>>
>>>>>>>> So here I was able eventually to drop all this code and this change here https://nam11.safelinks.protection.outlook.com/?url=https:%2F%2Fcgit.freedesktop.org%2F~agrodzov%2Flinux%2Fcommit%2F%3Fh%3Damd-staging-drm-next-device-unplug%26id%3D61852c8a59b4dd89d637693552c73175b9f2ccd6&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C13040ab9b50947a95acc08d896eec15d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637425299507092187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=CIXEl9hWHTAdo7t9yrdtu0OdEIZ3X2GQmJRhDUj28mw%3D&amp;reserved=0
>>>>>>>> was enough for me. Seems like while device_remove_file can handle the use
>>>>>>>> case where the file and the parent directory already gone,
>>>>>>>> sysfs_remove_group goes down in flames in that case
>>>>>>>> due to kobj->sd being unset on device removal.
>>>>>>> A driver shouldn't ever have to remove individual sysfs groups, the
>>>>>>> driver core/bus logic should do it for them automatically.
>>>>>>>
>>>>>>> And whenever a driver calls a sysfs_* call, that's a hint that something
>>>>>>> is not working properly.
>>>>>> Do you mean that while the driver creates the groups and files explicitly
>>>>>> from it's different subsystems it should not explicitly remove each
>>>>>> one of them because all of them should be removed at once (and
>>>>>> recursively) when the device is being removed ?
>>>>> Individual drivers should never add groups/files in sysfs, the driver
>>>>> core should do it properly for you if you have everything set up
>>>>> properly.  And yes, the driver core will automatically remove them as
>>>>> well.
>>>>>
>>>>> Please use the default groups attribute for your bus/subsystem and this
>>>>> will happen automagically.
>>>> Hi Greg, I tried your suggestion to hang amdgpu's sysfs
>>>> attributes on default attributes in struct device.groups but turns out it's
>>>> not usable since by the
>>>> time i have access to struct device from amdgpu code it has already been
>>>> initialized by pci core
>>>> (i.e.  past the point where device_add->device_add_attrs->device_add_groups
>>>> with dev->groups is called)
>>>> and so i can't really use it.
>>> That's odd, why can't you just set the groups pointer in your pci_driver
>>> structure?  That's what it is there for, right?
>> I am probably missing something but amdgpu sysfs attrs are per device not
>> per driver
> Oops, you are right, you want the 'dev_groups' field.  Looks like pci
> doesn't export that directly, so you can do:
> 	.driver {
> 		.dev_groups = my_device_groups;
> 	},
> in your pci_driver structure.
>
> Or I'm sure the PCI driver maintainer would take a patch like
> 7d9c1d2f7aca ("USB: add support for dev_groups to struct
> usb_device_driver") was done for the USB subsystem, as diving into the
> "raw" .driver pointer isn't really that clean or nice in my opinion.


Looks like what I need exactly. I will probably start with assigning raw pointer 
just
to push ahead my work and in parallel will probably submit same patch as yours
for PCI subsystem review as the rework to switch to this is really minimal.

Andrey


>
> thanks,
>
> greg k-h
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2020-12-02 19:11 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-21  6:03 [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
2020-06-21  6:03 ` [PATCH v2 1/8] drm: Add dummy page per device or GEM object Andrey Grodzovsky
2020-06-22  9:35   ` Daniel Vetter
2020-06-22 14:21     ` Pekka Paalanen
2020-06-22 14:24       ` Daniel Vetter
2020-06-22 14:28         ` Pekka Paalanen
2020-11-09 20:34     ` Andrey Grodzovsky
2020-11-15  6:39     ` Andrey Grodzovsky
2020-06-22 13:18   ` Christian König
2020-06-22 14:23     ` Daniel Vetter
2020-06-22 14:32     ` Andrey Grodzovsky
2020-06-22 17:45       ` Christian König
2020-06-22 17:50         ` Daniel Vetter
2020-11-09 20:53           ` Andrey Grodzovsky
2020-11-13 20:52           ` Andrey Grodzovsky
2020-11-14  8:41             ` Christian König
2020-11-14  9:51               ` Daniel Vetter
2020-11-14  9:57                 ` Daniel Vetter
2020-11-16  9:42                   ` Michel Dänzer
2020-11-15  6:34                 ` Andrey Grodzovsky
2020-11-16  9:48                   ` Christian König
2020-11-16 19:00                     ` Andrey Grodzovsky
2020-11-16 20:36                       ` Christian König
2020-11-16 20:42                         ` Andrey Grodzovsky
2020-11-19 10:01                           ` Christian König
2020-06-21  6:03 ` [PATCH v2 2/8] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky
2020-06-22  9:41   ` Daniel Vetter
2020-06-24  3:31     ` Andrey Grodzovsky
2020-06-24  7:19       ` Daniel Vetter
2020-11-10 17:41     ` Andrey Grodzovsky
2020-06-22 19:30   ` Christian König
2020-06-21  6:03 ` [PATCH v2 3/8] drm/ttm: Add unampping of the entire device address space Andrey Grodzovsky
2020-06-22  9:45   ` Daniel Vetter
2020-06-23  5:00     ` Andrey Grodzovsky
2020-06-23 10:25       ` Daniel Vetter
2020-06-23 12:55         ` Christian König
2020-06-22 19:37   ` Christian König
2020-06-22 19:47   ` Alex Deucher
2020-06-21  6:03 ` [PATCH v2 4/8] drm/amdgpu: Split amdgpu_device_fini into early and late Andrey Grodzovsky
2020-06-22  9:48   ` Daniel Vetter
2020-11-12  4:19     ` Andrey Grodzovsky
2020-11-12  9:29       ` Daniel Vetter
2020-06-21  6:03 ` [PATCH v2 5/8] drm/amdgpu: Refactor sysfs removal Andrey Grodzovsky
2020-06-22  9:51   ` Daniel Vetter
2020-06-22 11:21     ` Greg KH
2020-06-22 16:07       ` Andrey Grodzovsky
2020-06-22 16:45         ` Greg KH
2020-06-23  4:51           ` Andrey Grodzovsky
2020-06-23  6:05             ` Greg KH
2020-06-24  3:04               ` Andrey Grodzovsky
2020-06-24  6:11                 ` Greg KH
2020-06-25  1:52                   ` Andrey Grodzovsky
2020-11-10 17:54                   ` Andrey Grodzovsky
2020-11-10 17:59                     ` Greg KH
2020-11-11 15:13                       ` Andrey Grodzovsky
2020-11-11 15:34                         ` Greg KH
2020-11-11 15:45                           ` Andrey Grodzovsky
2020-11-11 16:06                             ` Greg KH
2020-11-11 16:34                               ` Andrey Grodzovsky
2020-12-02 15:48                           ` Andrey Grodzovsky
2020-12-02 17:34                             ` Greg KH
2020-12-02 18:02                               ` Andrey Grodzovsky
2020-12-02 18:20                                 ` Greg KH
2020-12-02 18:40                                   ` Andrey Grodzovsky
2020-06-22 13:19   ` Christian König
2020-06-21  6:03 ` [PATCH v2 6/8] drm/amdgpu: Unmap entire device address space on device remove Andrey Grodzovsky
2020-06-22  9:56   ` Daniel Vetter
2020-06-22 19:38   ` Christian König
2020-06-22 19:48     ` Alex Deucher
2020-06-23 10:22       ` Daniel Vetter
2020-06-23 13:16         ` Christian König
2020-06-24  3:12           ` Andrey Grodzovsky
2020-06-21  6:03 ` [PATCH v2 7/8] drm/amdgpu: Fix sdma code crash post device unplug Andrey Grodzovsky
2020-06-22  9:55   ` Daniel Vetter
2020-06-22 19:40   ` Christian König
2020-06-23  5:11     ` Andrey Grodzovsky
2020-06-23  7:14       ` Christian König
2020-06-21  6:03 ` [PATCH v2 8/8] drm/amdgpu: Prevent any job recoveries after device is unplugged Andrey Grodzovsky
2020-06-22  9:53   ` Daniel Vetter
2020-11-17 18:38     ` Andrey Grodzovsky
2020-11-17 18:52       ` Daniel Vetter
2020-11-17 19:18         ` Andrey Grodzovsky
2020-11-17 19:49           ` Daniel Vetter
2020-11-17 20:07             ` Andrey Grodzovsky
2020-11-18  7:39               ` Daniel Vetter
2020-11-18 12:01                 ` Christian König
2020-11-18 15:43                   ` Luben Tuikov
2020-11-18 16:20                   ` Andrey Grodzovsky
2020-11-19  7:55                     ` Christian König
2020-11-19 15:02                       ` Andrey Grodzovsky
2020-11-19 15:29                         ` Daniel Vetter
2020-11-19 21:24                           ` Andrey Grodzovsky
2020-11-18  0:46             ` Luben Tuikov
2020-06-22  9:46 ` [PATCH v2 0/8] RFC Support hot device unplug in amdgpu Daniel Vetter
2020-06-23  5:14   ` Andrey Grodzovsky
2020-06-23  9:04     ` Michel Dänzer
2020-06-24  3:21       ` Andrey Grodzovsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).