All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/12] RFC Support hot device unplug in amdgpu
@ 2020-11-21  5:21 ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Until now extracting a card either by physical extraction (e.g. eGPU with 
thunderbolt connection or by emulation through  syfs -> /sys/bus/pci/devices/device_id/remove) 
would cause random crashes in user apps. The random crashes in apps were 
mostly due to the app having mapped a device backed BO into its address 
space was still trying to access the BO while the backing device was gone.
To answer this first problem Christian suggested to fix the handling of mapped 
memory in the clients when the device goes away by forcibly unmap all buffers the 
user processes has by clearing their respective VMAs mapping the device BOs. 
Then when the VMAs try to fill in the page tables again we check in the fault 
handlerif the device is removed and if so, return an error. This will generate a 
SIGBUS to the application which can then cleanly terminate.This indeed was done 
but this in turn created a problem of kernel OOPs were the OOPSes were due to the 
fact that while the app was terminating because of the SIGBUSit would trigger use 
after free in the driver by calling to accesses device structures that were already 
released from the pci remove sequence.This was handled by introducing a 'flush' 
sequence during device removal were we wait for drm file reference to drop to 0 
meaning all user clients directly using this device terminated.

v2:
Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document 
produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and 
waiting for all user clients having CPU mapping of device BOs to die was dropped. 
Instead as per the document suggestion the device structures are kept alive until 
the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs 
belonging to the device directly or by dma-buf import are rerouted to per user 
process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] 
since i am trying to get the minimal set of requirements that still give useful solution 
to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my 
test case is removing a secondary device, which is render only and is not involved 
in KMS.

v3:
More updates following comments from v2 such as removing loop to find DRM file when rerouting 
page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving 
prevention of GPU recovery post device unplug from amdgpu to scheduler layer. 
On top of that added unplug support for the IOMMU enabled system.

With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears 
is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working 
with the primary card or soft reset the device without hangs or oopses

TODOs for followup work:
Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel)
Rework AMDGPU sysfs handling using default groups attributes (Greg)
Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back.
Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card.

[1] - Discussions during v2 of the patchset https://lists.freedesktop.org/archives/amd-gfx/2020-June/050806.html
[2] - drm/doc: device hot-unplug for userspace https://www.spinics.net/lists/dri-devel/msg259755.html
[3] - Related gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081


Andrey Grodzovsky (12):
  drm: Add dummy page per device or GEM object
  drm: Unamp the entire device address space on device unplug
  drm/ttm: Remap all page faults to per process dummy page.
  drm/ttm: Set dma addr to null after freee
  drm/ttm: Expose ttm_tt_unpopulate for driver use
  drm/sched: Cancel and flush all oustatdning jobs before finish.
  drm/sched: Prevent any job recoveries after device is unplugged.
  drm/amdgpu: Split amdgpu_device_fini into early and late
  drm/amdgpu: Add early fini callback
  drm/amdgpu: Avoid sysfs dirs removal post device unplug
  drm/amdgpu: Register IOMMU topology notifier per device.
  drm/amdgpu: Fix a bunch of sdma code crash post device unplug

 drivers/gpu/drm/amd/amdgpu/amdgpu.h               | 11 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c        | 82 +++++++++++++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c           |  7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c         | 17 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c          |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h          |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c           | 24 ++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h           |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c           | 12 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c        | 10 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h        |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c           |  7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h          |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c         |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c            |  8 ++-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +++-
 drivers/gpu/drm/amd/include/amd_shared.h          |  2 +
 drivers/gpu/drm/drm_drv.c                         |  3 +
 drivers/gpu/drm/drm_file.c                        |  8 +++
 drivers/gpu/drm/drm_prime.c                       | 10 +++
 drivers/gpu/drm/etnaviv/etnaviv_sched.c           |  3 +-
 drivers/gpu/drm/lima/lima_sched.c                 |  3 +-
 drivers/gpu/drm/panfrost/panfrost_job.c           |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c            | 18 ++++-
 drivers/gpu/drm/ttm/ttm_bo_vm.c                   | 54 ++++++++++++---
 drivers/gpu/drm/ttm/ttm_page_alloc.c              |  2 +
 drivers/gpu/drm/ttm/ttm_tt.c                      |  1 +
 drivers/gpu/drm/v3d/v3d_sched.c                   | 15 +++--
 include/drm/drm_file.h                            |  2 +
 include/drm/drm_gem.h                             |  2 +
 include/drm/gpu_scheduler.h                       |  6 +-
 31 files changed, 287 insertions(+), 47 deletions(-)

-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* [PATCH v3 00/12] RFC Support hot device unplug in amdgpu
@ 2020-11-21  5:21 ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

Until now extracting a card either by physical extraction (e.g. eGPU with 
thunderbolt connection or by emulation through  syfs -> /sys/bus/pci/devices/device_id/remove) 
would cause random crashes in user apps. The random crashes in apps were 
mostly due to the app having mapped a device backed BO into its address 
space was still trying to access the BO while the backing device was gone.
To answer this first problem Christian suggested to fix the handling of mapped 
memory in the clients when the device goes away by forcibly unmap all buffers the 
user processes has by clearing their respective VMAs mapping the device BOs. 
Then when the VMAs try to fill in the page tables again we check in the fault 
handlerif the device is removed and if so, return an error. This will generate a 
SIGBUS to the application which can then cleanly terminate.This indeed was done 
but this in turn created a problem of kernel OOPs were the OOPSes were due to the 
fact that while the app was terminating because of the SIGBUSit would trigger use 
after free in the driver by calling to accesses device structures that were already 
released from the pci remove sequence.This was handled by introducing a 'flush' 
sequence during device removal were we wait for drm file reference to drop to 0 
meaning all user clients directly using this device terminated.

v2:
Based on discussions in the mailing list with Daniel and Pekka [1] and based on the document 
produced by Pekka from those discussions [2] the whole approach with returning SIGBUS and 
waiting for all user clients having CPU mapping of device BOs to die was dropped. 
Instead as per the document suggestion the device structures are kept alive until 
the last reference to the device is dropped by user client and in the meanwhile all existing and new CPU mappings of the BOs 
belonging to the device directly or by dma-buf import are rerouted to per user 
process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section of [2] 
since i am trying to get the minimal set of requirements that still give useful solution 
to work and this is the'Requirements for Render and Cross-Device UAPI' section and so my 
test case is removing a secondary device, which is render only and is not involved 
in KMS.

v3:
More updates following comments from v2 such as removing loop to find DRM file when rerouting 
page faults to dummy page,getting rid of unnecessary sysfs handling refactoring and moving 
prevention of GPU recovery post device unplug from amdgpu to scheduler layer. 
On top of that added unplug support for the IOMMU enabled system.

With these patches I am able to gracefully remove the secondary card using sysfs remove hook while glxgears 
is running off of secondary card (DRI_PRIME=1) without kernel oopses or hangs and keep working 
with the primary card or soft reset the device without hangs or oopses

TODOs for followup work:
Convert AMDGPU code to use devm (for hw stuff) and drmm (for sw stuff and allocations) (Daniel)
Rework AMDGPU sysfs handling using default groups attributes (Greg)
Support plugging the secondary device back after unplug - currently still experiencing HW error on plugging back.
Add support for 'Requirements for KMS UAPI' section of [2] - unplugging primary, display connected card.

[1] - Discussions during v2 of the patchset https://lists.freedesktop.org/archives/amd-gfx/2020-June/050806.html
[2] - drm/doc: device hot-unplug for userspace https://www.spinics.net/lists/dri-devel/msg259755.html
[3] - Related gitlab ticket https://gitlab.freedesktop.org/drm/amd/-/issues/1081


Andrey Grodzovsky (12):
  drm: Add dummy page per device or GEM object
  drm: Unamp the entire device address space on device unplug
  drm/ttm: Remap all page faults to per process dummy page.
  drm/ttm: Set dma addr to null after freee
  drm/ttm: Expose ttm_tt_unpopulate for driver use
  drm/sched: Cancel and flush all oustatdning jobs before finish.
  drm/sched: Prevent any job recoveries after device is unplugged.
  drm/amdgpu: Split amdgpu_device_fini into early and late
  drm/amdgpu: Add early fini callback
  drm/amdgpu: Avoid sysfs dirs removal post device unplug
  drm/amdgpu: Register IOMMU topology notifier per device.
  drm/amdgpu: Fix a bunch of sdma code crash post device unplug

 drivers/gpu/drm/amd/amdgpu/amdgpu.h               | 11 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c        | 82 +++++++++++++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c           |  7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c         | 17 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c          |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h          |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c           | 24 ++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h           |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c           | 12 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c        | 10 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h        |  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c           |  7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h          |  3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c         |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c            |  8 ++-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +++-
 drivers/gpu/drm/amd/include/amd_shared.h          |  2 +
 drivers/gpu/drm/drm_drv.c                         |  3 +
 drivers/gpu/drm/drm_file.c                        |  8 +++
 drivers/gpu/drm/drm_prime.c                       | 10 +++
 drivers/gpu/drm/etnaviv/etnaviv_sched.c           |  3 +-
 drivers/gpu/drm/lima/lima_sched.c                 |  3 +-
 drivers/gpu/drm/panfrost/panfrost_job.c           |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c            | 18 ++++-
 drivers/gpu/drm/ttm/ttm_bo_vm.c                   | 54 ++++++++++++---
 drivers/gpu/drm/ttm/ttm_page_alloc.c              |  2 +
 drivers/gpu/drm/ttm/ttm_tt.c                      |  1 +
 drivers/gpu/drm/v3d/v3d_sched.c                   | 15 +++--
 include/drm/drm_file.h                            |  2 +
 include/drm/drm_gem.h                             |  2 +
 include/drm/gpu_scheduler.h                       |  6 +-
 31 files changed, 287 insertions(+), 47 deletions(-)

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Will be used to reroute CPU mapped BO's page faults once
device is removed.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/drm_file.c  |  8 ++++++++
 drivers/gpu/drm/drm_prime.c | 10 ++++++++++
 include/drm/drm_file.h      |  2 ++
 include/drm/drm_gem.h       |  2 ++
 4 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 0ac4566..ff3d39f 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
 			goto out_prime_destroy;
 	}
 
+	file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!file->dummy_page) {
+		ret = -ENOMEM;
+		goto out_prime_destroy;
+	}
+
 	return file;
 
 out_prime_destroy:
@@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
 	if (dev->driver->postclose)
 		dev->driver->postclose(dev, file);
 
+	__free_page(file->dummy_page);
+
 	drm_prime_destroy_file_private(&file->prime);
 
 	WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 1693aa7..987b45c 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
 
 	ret = drm_prime_add_buf_handle(&file_priv->prime,
 			dma_buf, *handle);
+
+	if (!ret) {
+		obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!obj->dummy_page)
+			ret = -ENOMEM;
+	}
+
 	mutex_unlock(&file_priv->prime.lock);
 	if (ret)
 		goto fail;
@@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
 		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
 	dma_buf = attach->dmabuf;
 	dma_buf_detach(attach->dmabuf, attach);
+
+	__free_page(obj->dummy_page);
+
 	/* remove the reference */
 	dma_buf_put(dma_buf);
 }
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 716990b..2a011fc 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -346,6 +346,8 @@ struct drm_file {
 	 */
 	struct drm_prime_file_private prime;
 
+	struct page *dummy_page;
+
 	/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
 	unsigned long lock_count; /* DRI1 legacy lock count */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 337a483..76a97a3 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -311,6 +311,8 @@ struct drm_gem_object {
 	 *
 	 */
 	const struct drm_gem_object_funcs *funcs;
+
+	struct page *dummy_page;
 };
 
 /**
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

Will be used to reroute CPU mapped BO's page faults once
device is removed.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/drm_file.c  |  8 ++++++++
 drivers/gpu/drm/drm_prime.c | 10 ++++++++++
 include/drm/drm_file.h      |  2 ++
 include/drm/drm_gem.h       |  2 ++
 4 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 0ac4566..ff3d39f 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
 			goto out_prime_destroy;
 	}
 
+	file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+	if (!file->dummy_page) {
+		ret = -ENOMEM;
+		goto out_prime_destroy;
+	}
+
 	return file;
 
 out_prime_destroy:
@@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
 	if (dev->driver->postclose)
 		dev->driver->postclose(dev, file);
 
+	__free_page(file->dummy_page);
+
 	drm_prime_destroy_file_private(&file->prime);
 
 	WARN_ON(!list_empty(&file->event_list));
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 1693aa7..987b45c 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
 
 	ret = drm_prime_add_buf_handle(&file_priv->prime,
 			dma_buf, *handle);
+
+	if (!ret) {
+		obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+		if (!obj->dummy_page)
+			ret = -ENOMEM;
+	}
+
 	mutex_unlock(&file_priv->prime.lock);
 	if (ret)
 		goto fail;
@@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
 		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
 	dma_buf = attach->dmabuf;
 	dma_buf_detach(attach->dmabuf, attach);
+
+	__free_page(obj->dummy_page);
+
 	/* remove the reference */
 	dma_buf_put(dma_buf);
 }
diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
index 716990b..2a011fc 100644
--- a/include/drm/drm_file.h
+++ b/include/drm/drm_file.h
@@ -346,6 +346,8 @@ struct drm_file {
 	 */
 	struct drm_prime_file_private prime;
 
+	struct page *dummy_page;
+
 	/* private: */
 #if IS_ENABLED(CONFIG_DRM_LEGACY)
 	unsigned long lock_count; /* DRI1 legacy lock count */
diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
index 337a483..76a97a3 100644
--- a/include/drm/drm_gem.h
+++ b/include/drm/drm_gem.h
@@ -311,6 +311,8 @@ struct drm_gem_object {
 	 *
 	 */
 	const struct drm_gem_object_funcs *funcs;
+
+	struct page *dummy_page;
 };
 
 /**
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 02/12] drm: Unamp the entire device address space on device unplug
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Invalidate all BOs CPU mappings once device is removed.

v3: Move the code from TTM into drm_dev_unplug

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/drm_drv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 13068fd..d550fd5 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -479,6 +479,9 @@ void drm_dev_unplug(struct drm_device *dev)
 	synchronize_srcu(&drm_unplug_srcu);
 
 	drm_dev_unregister(dev);
+
+	/* Clear all CPU mappings pointing to this device */
+	unmap_mapping_range(dev->anon_inode->i_mapping, 0, 0, 1);
 }
 EXPORT_SYMBOL(drm_dev_unplug);
 
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 02/12] drm: Unamp the entire device address space on device unplug
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

Invalidate all BOs CPU mappings once device is removed.

v3: Move the code from TTM into drm_dev_unplug

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/drm_drv.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
index 13068fd..d550fd5 100644
--- a/drivers/gpu/drm/drm_drv.c
+++ b/drivers/gpu/drm/drm_drv.c
@@ -479,6 +479,9 @@ void drm_dev_unplug(struct drm_device *dev)
 	synchronize_srcu(&drm_unplug_srcu);
 
 	drm_dev_unregister(dev);
+
+	/* Clear all CPU mappings pointing to this device */
+	unmap_mapping_range(dev->anon_inode->i_mapping, 0, 0, 1);
 }
 EXPORT_SYMBOL(drm_dev_unplug);
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 03/12] drm/ttm: Remap all page faults to per process dummy page.
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

On device removal reroute all CPU mappings to dummy page
per drm_file instance or imported GEM object.

v3:
Remove loop to find DRM file and instead access it
by vma->vm_file->private_data. Move dummy page installation
into a separate function.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 54 +++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 01693e8..f2dbb93 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -35,6 +35,8 @@
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
 #include <drm/drm_vma_manager.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
 #include <linux/mm.h>
 #include <linux/pfn_t.h>
 #include <linux/rbtree.h>
@@ -420,23 +422,59 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 }
 EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
 
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct ttm_buffer_object *bo = vma->vm_private_data;
+	struct drm_file *file = NULL;
+	struct page *dummy_page = NULL;
+
+	/* We are faulting on imported BO from dma_buf */
+	if (bo->base.dma_buf && bo->base.import_attach) {
+		dummy_page = bo->base.dummy_page;
+	/* We are faulting on local BO */
+	} else {
+		file = vma->vm_file->private_data;
+		dummy_page = file->dummy_page;
+	}
+
+	/* Let do_fault complete the PTE install e.t.c using vmf->page */
+	get_page(dummy_page);
+	vmf->page = dummy_page;
+
+	return 0;
+}
+
 vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	pgprot_t prot;
 	struct ttm_buffer_object *bo = vma->vm_private_data;
 	vm_fault_t ret;
+	int idx;
+	struct drm_device *ddev = bo->base.dev;
 
-	ret = ttm_bo_vm_reserve(bo, vmf);
-	if (ret)
-		return ret;
 
-	prot = vma->vm_page_prot;
-	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
-	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
-		return ret;
+	if (!drm_dev_enter(ddev, &idx)) {
+		ret = ttm_bo_vm_dummy_page(vmf);
+		if (ret)
+			return ret;
+	} else {
+		ret = ttm_bo_vm_reserve(bo, vmf);
+		if (ret)
+			goto exit;
 
-	dma_resv_unlock(bo->base.resv);
+		prot = vma->vm_page_prot;
+
+		ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
+		if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
+			goto exit;
+
+		dma_resv_unlock(bo->base.resv);
+
+exit:
+		drm_dev_exit(idx);
+	}
 
 	return ret;
 }
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 03/12] drm/ttm: Remap all page faults to per process dummy page.
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

On device removal reroute all CPU mappings to dummy page
per drm_file instance or imported GEM object.

v3:
Remove loop to find DRM file and instead access it
by vma->vm_file->private_data. Move dummy page installation
into a separate function.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 54 +++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 01693e8..f2dbb93 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -35,6 +35,8 @@
 #include <drm/ttm/ttm_bo_driver.h>
 #include <drm/ttm/ttm_placement.h>
 #include <drm/drm_vma_manager.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
 #include <linux/mm.h>
 #include <linux/pfn_t.h>
 #include <linux/rbtree.h>
@@ -420,23 +422,59 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 }
 EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
 
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf)
+{
+	struct vm_area_struct *vma = vmf->vma;
+	struct ttm_buffer_object *bo = vma->vm_private_data;
+	struct drm_file *file = NULL;
+	struct page *dummy_page = NULL;
+
+	/* We are faulting on imported BO from dma_buf */
+	if (bo->base.dma_buf && bo->base.import_attach) {
+		dummy_page = bo->base.dummy_page;
+	/* We are faulting on local BO */
+	} else {
+		file = vma->vm_file->private_data;
+		dummy_page = file->dummy_page;
+	}
+
+	/* Let do_fault complete the PTE install e.t.c using vmf->page */
+	get_page(dummy_page);
+	vmf->page = dummy_page;
+
+	return 0;
+}
+
 vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
 	pgprot_t prot;
 	struct ttm_buffer_object *bo = vma->vm_private_data;
 	vm_fault_t ret;
+	int idx;
+	struct drm_device *ddev = bo->base.dev;
 
-	ret = ttm_bo_vm_reserve(bo, vmf);
-	if (ret)
-		return ret;
 
-	prot = vma->vm_page_prot;
-	ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
-	if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
-		return ret;
+	if (!drm_dev_enter(ddev, &idx)) {
+		ret = ttm_bo_vm_dummy_page(vmf);
+		if (ret)
+			return ret;
+	} else {
+		ret = ttm_bo_vm_reserve(bo, vmf);
+		if (ret)
+			goto exit;
 
-	dma_resv_unlock(bo->base.resv);
+		prot = vma->vm_page_prot;
+
+		ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
+		if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
+			goto exit;
+
+		dma_resv_unlock(bo->base.resv);
+
+exit:
+		drm_dev_exit(idx);
+	}
 
 	return ret;
 }
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 04/12] drm/ttm: Set dma addr to null after freee
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Fixes oops.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index b40a467..b0df328 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, struct ttm_dma_tt *tt)
 		dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE,
 			       DMA_BIDIRECTIONAL);
 
+		tt->dma_address[i] = 0;
+
 		i += num_pages;
 	}
 	ttm_pool_unpopulate(&tt->ttm);
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 04/12] drm/ttm: Set dma addr to null after freee
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

Fixes oops.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
index b40a467..b0df328 100644
--- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
+++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
@@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, struct ttm_dma_tt *tt)
 		dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE,
 			       DMA_BIDIRECTIONAL);
 
+		tt->dma_address[i] = 0;
+
 		i += num_pages;
 	}
 	ttm_pool_unpopulate(&tt->ttm);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

It's needed to drop iommu backed pages on device unplug
before device's IOMMU group is released.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 1ccf1ef..29248a5 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
 	else
 		ttm_pool_unpopulate(ttm);
 }
+EXPORT_SYMBOL(ttm_tt_unpopulate);
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

It's needed to drop iommu backed pages on device unplug
before device's IOMMU group is released.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/ttm/ttm_tt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 1ccf1ef..29248a5 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
 	else
 		ttm_pool_unpopulate(ttm);
 }
+EXPORT_SYMBOL(ttm_tt_unpopulate);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 06/12] drm/sched: Cancel and flush all oustatdning jobs before finish.
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

To avoid any possible use after free.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index da24c4e..c3f0bd0 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -891,6 +891,9 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	if (sched->thread)
 		kthread_stop(sched->thread);
 
+	/* Confirm no work left behind accessing device structures */
+	cancel_delayed_work_sync(&sched->work_tdr);
+
 	sched->ready = false;
 }
 EXPORT_SYMBOL(drm_sched_fini);
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 06/12] drm/sched: Cancel and flush all oustatdning jobs before finish.
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

To avoid any possible use after free.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index da24c4e..c3f0bd0 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -891,6 +891,9 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
 	if (sched->thread)
 		kthread_stop(sched->thread);
 
+	/* Confirm no work left behind accessing device structures */
+	cancel_delayed_work_sync(&sched->work_tdr);
+
 	sched->ready = false;
 }
 EXPORT_SYMBOL(drm_sched_fini);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

No point to try recovery if device is gone, it's meaningless.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
 drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
 drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
 drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
 include/drm/gpu_scheduler.h               |  6 +++++-
 7 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d56f402..d0b0021 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
 
 		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
 				   num_hw_submission, amdgpu_job_hang_limit,
-				   timeout, ring->name);
+				   timeout, ring->name, &adev->ddev);
 		if (r) {
 			DRM_ERROR("Failed to create scheduler on ring %s.\n",
 				  ring->name);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index cd46c88..7678287 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 
 	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
 			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
-			     msecs_to_jiffies(500), dev_name(gpu->dev));
+			     msecs_to_jiffies(500), dev_name(gpu->dev),
+			     gpu->drm);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index dc6df9e..8a7e5d7ca 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
 
 	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
 			      lima_job_hang_limit, msecs_to_jiffies(timeout),
-			      name);
+			      name,
+			      pipe->ldev->ddev);
 }
 
 void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 30e7b71..37b03b01 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 		ret = drm_sched_init(&js->queue[j].sched,
 				     &panfrost_sched_ops,
 				     1, 0, msecs_to_jiffies(500),
-				     "pan_js");
+				     "pan_js", pfdev->ddev);
 		if (ret) {
 			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
 			goto err_sched;
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c3f0bd0..95db8c6 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -53,6 +53,7 @@
 #include <drm/drm_print.h>
 #include <drm/gpu_scheduler.h>
 #include <drm/spsc_queue.h>
+#include <drm/drm_drv.h>
 
 #define CREATE_TRACE_POINTS
 #include "gpu_scheduler_trace.h"
@@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct work_struct *work)
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_job *job;
 
+	int idx;
+
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
+	if (!drm_dev_enter(sched->ddev, &idx)) {
+		DRM_INFO("%s - device unplugged skipping recovery on scheduler:%s",
+			 __func__, sched->name);
+		return;
+	}
+
 	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
 	spin_lock(&sched->job_list_lock);
 	job = list_first_entry_or_null(&sched->ring_mirror_list,
@@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
 	spin_lock(&sched->job_list_lock);
 	drm_sched_start_timeout(sched);
 	spin_unlock(&sched->job_list_lock);
+
+	drm_dev_exit(idx);
 }
 
  /**
@@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   unsigned hw_submission,
 		   unsigned hang_limit,
 		   long timeout,
-		   const char *name)
+		   const char *name,
+		   struct drm_device *ddev)
 {
 	int i, ret;
 	sched->ops = ops;
@@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	sched->name = name;
 	sched->timeout = timeout;
 	sched->hang_limit = hang_limit;
+	sched->ddev = ddev;
 	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
 		drm_sched_rq_init(sched, &sched->sched_rq[i]);
 
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 0747614..f5076e5 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_bin_sched_ops,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms),
-			     "v3d_bin");
+			     "v3d_bin",
+			     &v3d->drm);
 	if (ret) {
 		dev_err(v3d->drm.dev, "Failed to create bin scheduler: %d.", ret);
 		return ret;
@@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_render_sched_ops,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms),
-			     "v3d_render");
+			     "v3d_render",
+			     &v3d->drm);
 	if (ret) {
 		dev_err(v3d->drm.dev, "Failed to create render scheduler: %d.",
 			ret);
@@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_tfu_sched_ops,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms),
-			     "v3d_tfu");
+			     "v3d_tfu",
+			     &v3d->drm);
 	if (ret) {
 		dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
 			ret);
@@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 				     &v3d_csd_sched_ops,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms),
-				     "v3d_csd");
+				     "v3d_csd",
+				     &v3d->drm);
 		if (ret) {
 			dev_err(v3d->drm.dev, "Failed to create CSD scheduler: %d.",
 				ret);
@@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 				     &v3d_cache_clean_sched_ops,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms),
-				     "v3d_cache_clean");
+				     "v3d_cache_clean",
+				     &v3d->drm);
 		if (ret) {
 			dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN scheduler: %d.",
 				ret);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 9243655..a980709 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -32,6 +32,7 @@
 
 struct drm_gpu_scheduler;
 struct drm_sched_rq;
+struct drm_device;
 
 /* These are often used as an (initial) index
  * to an array, and as such should start at 0.
@@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
  * @score: score to help loadbalancer pick a idle sched
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
+ * @ddev: Pointer to drm device of this scheduler.
  *
  * One scheduler is implemented for each hardware ring.
  */
@@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
 	atomic_t                        score;
 	bool				ready;
 	bool				free_guilty;
+	struct drm_device		*ddev;
 };
 
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
 		   uint32_t hw_submission, unsigned hang_limit, long timeout,
-		   const char *name);
+		   const char *name,
+		   struct drm_device *ddev);
 
 void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

No point to try recovery if device is gone, it's meaningless.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
 drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
 drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
 drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
 drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
 include/drm/gpu_scheduler.h               |  6 +++++-
 7 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d56f402..d0b0021 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
 
 		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
 				   num_hw_submission, amdgpu_job_hang_limit,
-				   timeout, ring->name);
+				   timeout, ring->name, &adev->ddev);
 		if (r) {
 			DRM_ERROR("Failed to create scheduler on ring %s.\n",
 				  ring->name);
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
index cd46c88..7678287 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
@@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
 
 	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
 			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
-			     msecs_to_jiffies(500), dev_name(gpu->dev));
+			     msecs_to_jiffies(500), dev_name(gpu->dev),
+			     gpu->drm);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
index dc6df9e..8a7e5d7ca 100644
--- a/drivers/gpu/drm/lima/lima_sched.c
+++ b/drivers/gpu/drm/lima/lima_sched.c
@@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
 
 	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
 			      lima_job_hang_limit, msecs_to_jiffies(timeout),
-			      name);
+			      name,
+			      pipe->ldev->ddev);
 }
 
 void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
index 30e7b71..37b03b01 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
 		ret = drm_sched_init(&js->queue[j].sched,
 				     &panfrost_sched_ops,
 				     1, 0, msecs_to_jiffies(500),
-				     "pan_js");
+				     "pan_js", pfdev->ddev);
 		if (ret) {
 			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
 			goto err_sched;
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index c3f0bd0..95db8c6 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -53,6 +53,7 @@
 #include <drm/drm_print.h>
 #include <drm/gpu_scheduler.h>
 #include <drm/spsc_queue.h>
+#include <drm/drm_drv.h>
 
 #define CREATE_TRACE_POINTS
 #include "gpu_scheduler_trace.h"
@@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct work_struct *work)
 	struct drm_gpu_scheduler *sched;
 	struct drm_sched_job *job;
 
+	int idx;
+
 	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
+	if (!drm_dev_enter(sched->ddev, &idx)) {
+		DRM_INFO("%s - device unplugged skipping recovery on scheduler:%s",
+			 __func__, sched->name);
+		return;
+	}
+
 	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
 	spin_lock(&sched->job_list_lock);
 	job = list_first_entry_or_null(&sched->ring_mirror_list,
@@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
 	spin_lock(&sched->job_list_lock);
 	drm_sched_start_timeout(sched);
 	spin_unlock(&sched->job_list_lock);
+
+	drm_dev_exit(idx);
 }
 
  /**
@@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   unsigned hw_submission,
 		   unsigned hang_limit,
 		   long timeout,
-		   const char *name)
+		   const char *name,
+		   struct drm_device *ddev)
 {
 	int i, ret;
 	sched->ops = ops;
@@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
 	sched->name = name;
 	sched->timeout = timeout;
 	sched->hang_limit = hang_limit;
+	sched->ddev = ddev;
 	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
 		drm_sched_rq_init(sched, &sched->sched_rq[i]);
 
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
index 0747614..f5076e5 100644
--- a/drivers/gpu/drm/v3d/v3d_sched.c
+++ b/drivers/gpu/drm/v3d/v3d_sched.c
@@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_bin_sched_ops,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms),
-			     "v3d_bin");
+			     "v3d_bin",
+			     &v3d->drm);
 	if (ret) {
 		dev_err(v3d->drm.dev, "Failed to create bin scheduler: %d.", ret);
 		return ret;
@@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_render_sched_ops,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms),
-			     "v3d_render");
+			     "v3d_render",
+			     &v3d->drm);
 	if (ret) {
 		dev_err(v3d->drm.dev, "Failed to create render scheduler: %d.",
 			ret);
@@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 			     &v3d_tfu_sched_ops,
 			     hw_jobs_limit, job_hang_limit,
 			     msecs_to_jiffies(hang_limit_ms),
-			     "v3d_tfu");
+			     "v3d_tfu",
+			     &v3d->drm);
 	if (ret) {
 		dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
 			ret);
@@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 				     &v3d_csd_sched_ops,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms),
-				     "v3d_csd");
+				     "v3d_csd",
+				     &v3d->drm);
 		if (ret) {
 			dev_err(v3d->drm.dev, "Failed to create CSD scheduler: %d.",
 				ret);
@@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
 				     &v3d_cache_clean_sched_ops,
 				     hw_jobs_limit, job_hang_limit,
 				     msecs_to_jiffies(hang_limit_ms),
-				     "v3d_cache_clean");
+				     "v3d_cache_clean",
+				     &v3d->drm);
 		if (ret) {
 			dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN scheduler: %d.",
 				ret);
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 9243655..a980709 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -32,6 +32,7 @@
 
 struct drm_gpu_scheduler;
 struct drm_sched_rq;
+struct drm_device;
 
 /* These are often used as an (initial) index
  * to an array, and as such should start at 0.
@@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
  * @score: score to help loadbalancer pick a idle sched
  * @ready: marks if the underlying HW is ready to work
  * @free_guilty: A hit to time out handler to free the guilty job.
+ * @ddev: Pointer to drm device of this scheduler.
  *
  * One scheduler is implemented for each hardware ring.
  */
@@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
 	atomic_t                        score;
 	bool				ready;
 	bool				free_guilty;
+	struct drm_device		*ddev;
 };
 
 int drm_sched_init(struct drm_gpu_scheduler *sched,
 		   const struct drm_sched_backend_ops *ops,
 		   uint32_t hw_submission, unsigned hang_limit, long timeout,
-		   const char *name);
+		   const char *name,
+		   struct drm_device *ddev);
 
 void drm_sched_fini(struct drm_gpu_scheduler *sched);
 int drm_sched_job_init(struct drm_sched_job *job,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Some of the stuff in amdgpu_device_fini such as HW interrupts
disable and pending fences finilization must be done right away on
pci_remove while most of the stuff which relates to finilizing and
releasing driver data structures can be kept until
drm_driver.release hook is called, i.e. when the last device
reference is dropped.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
 9 files changed, 65 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 83ac06a..6243f6d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
 
 int amdgpu_device_init(struct amdgpu_device *adev,
 		       uint32_t flags);
-void amdgpu_device_fini(struct amdgpu_device *adev);
+void amdgpu_device_fini_early(struct amdgpu_device *adev);
+void amdgpu_device_fini_late(struct amdgpu_device *adev);
+
 int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
 
 void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
@@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
 int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
 void amdgpu_driver_postclose_kms(struct drm_device *dev,
 				 struct drm_file *file_priv);
+void amdgpu_driver_release_kms(struct drm_device *dev);
+
 int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
 int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
 int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2f60b70..797d94d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  * Tear down the driver info (all asics).
  * Called at driver shutdown.
  */
-void amdgpu_device_fini(struct amdgpu_device *adev)
+void amdgpu_device_fini_early(struct amdgpu_device *adev)
 {
 	dev_info(adev->dev, "amdgpu: finishing device.\n");
 	flush_delayed_work(&adev->delayed_init_work);
 	adev->shutdown = true;
 
-	kfree(adev->pci_state);
-
 	/* make sure IB test finished before entering exclusive mode
 	 * to avoid preemption on IB test
 	 * */
@@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
 		else
 			drm_atomic_helper_shutdown(adev_to_drm(adev));
 	}
-	amdgpu_fence_driver_fini(adev);
+	amdgpu_fence_driver_fini_early(adev);
 	if (adev->pm_sysfs_en)
 		amdgpu_pm_sysfs_fini(adev);
 	amdgpu_fbdev_fini(adev);
+
+	amdgpu_irq_fini_early(adev);
+}
+
+void amdgpu_device_fini_late(struct amdgpu_device *adev)
+{
 	amdgpu_device_ip_fini(adev);
+	amdgpu_fence_driver_fini_late(adev);
 	release_firmware(adev->firmware.gpu_info_fw);
 	adev->firmware.gpu_info_fw = NULL;
 	adev->accel_working = false;
@@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
 		amdgpu_pmu_fini(adev);
 	if (adev->mman.discovery_bin)
 		amdgpu_discovery_fini(adev);
+
+	kfree(adev->pci_state);
+
 }
 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 7f98cf1..3d130fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
 {
 	struct drm_device *dev = pci_get_drvdata(pdev);
 
-#ifdef MODULE
-	if (THIS_MODULE->state != MODULE_STATE_GOING)
-#endif
-		DRM_ERROR("Hotplug removal is not supported\n");
 	drm_dev_unplug(dev);
 	amdgpu_driver_unload_kms(dev);
+
 	pci_disable_device(pdev);
-	pci_set_drvdata(pdev, NULL);
 	drm_dev_put(dev);
 }
 
@@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
 	.dumb_create = amdgpu_mode_dumb_create,
 	.dumb_map_offset = amdgpu_mode_dumb_mmap,
 	.fops = &amdgpu_driver_kms_fops,
+	.release = &amdgpu_driver_release_kms,
 
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d0b0021..c123aa6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
  *
  * Tear down the fence driver for all possible rings (all asics).
  */
-void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
+void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
 {
 	unsigned i, j;
 	int r;
@@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
 		if (!ring->no_scheduler)
 			drm_sched_fini(&ring->sched);
 		del_timer_sync(&ring->fence_drv.fallback_timer);
+	}
+}
+
+void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
+		struct amdgpu_ring *ring = adev->rings[i];
+
+		if (!ring || !ring->fence_drv.initialized)
+			continue;
+
 		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
 			dma_fence_put(ring->fence_drv.fences[j]);
 		kfree(ring->fence_drv.fences);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 300ac73..a833197 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -49,6 +49,7 @@
 #include <drm/drm_irq.h>
 #include <drm/drm_vblank.h>
 #include <drm/amdgpu_drm.h>
+#include <drm/drm_drv.h>
 #include "amdgpu.h"
 #include "amdgpu_ih.h"
 #include "atom.h"
@@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 	return 0;
 }
 
+
+void amdgpu_irq_fini_early(struct amdgpu_device *adev)
+{
+	if (adev->irq.installed) {
+		drm_irq_uninstall(&adev->ddev);
+		adev->irq.installed = false;
+		if (adev->irq.msi_enabled)
+			pci_free_irq_vectors(adev->pdev);
+
+		if (!amdgpu_device_has_dc_support(adev))
+			flush_work(&adev->hotplug_work);
+	}
+}
+
 /**
  * amdgpu_irq_fini - shut down interrupt handling
  *
@@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
 {
 	unsigned i, j;
 
-	if (adev->irq.installed) {
-		drm_irq_uninstall(adev_to_drm(adev));
-		adev->irq.installed = false;
-		if (adev->irq.msi_enabled)
-			pci_free_irq_vectors(adev->pdev);
-		if (!amdgpu_device_has_dc_support(adev))
-			flush_work(&adev->hotplug_work);
-	}
-
 	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
 		if (!adev->irq.client[i].sources)
 			continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
index c718e94..718c70f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
@@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
 
 int amdgpu_irq_init(struct amdgpu_device *adev);
 void amdgpu_irq_fini(struct amdgpu_device *adev);
+void amdgpu_irq_fini_early(struct amdgpu_device *adev);
 int amdgpu_irq_add_id(struct amdgpu_device *adev,
 		      unsigned client_id, unsigned src_id,
 		      struct amdgpu_irq_src *source);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index a0af8a7..9e30c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -29,6 +29,7 @@
 #include "amdgpu.h"
 #include <drm/drm_debugfs.h>
 #include <drm/amdgpu_drm.h>
+#include <drm/drm_drv.h>
 #include "amdgpu_sched.h"
 #include "amdgpu_uvd.h"
 #include "amdgpu_vce.h"
@@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
 	}
 
 	amdgpu_acpi_fini(adev);
-	amdgpu_device_fini(adev);
+	amdgpu_device_fini_early(adev);
 }
 
 void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
@@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	pm_runtime_put_autosuspend(dev->dev);
 }
 
+
+void amdgpu_driver_release_kms(struct drm_device *dev)
+{
+	struct amdgpu_device *adev = drm_to_adev(dev);
+
+	amdgpu_device_fini_late(adev);
+	pci_set_drvdata(adev->pdev, NULL);
+}
+
 /*
  * VBlank related functions.
  */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 9d11b84..caf828a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
 {
 	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
 
+	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
+
 	if (!con)
 		return 0;
 
+
 	/* Need disable ras on all IPs here before ip [hw/sw]fini */
 	amdgpu_ras_disable_all_features(adev, 0);
 	amdgpu_ras_recovery_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7112137..074f36b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
 };
 
 int amdgpu_fence_driver_init(struct amdgpu_device *adev);
-void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
+void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
+void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
 void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
 
 int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

Some of the stuff in amdgpu_device_fini such as HW interrupts
disable and pending fences finilization must be done right away on
pci_remove while most of the stuff which relates to finilizing and
releasing driver data structures can be kept until
drm_driver.release hook is called, i.e. when the last device
reference is dropped.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
 9 files changed, 65 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 83ac06a..6243f6d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
 
 int amdgpu_device_init(struct amdgpu_device *adev,
 		       uint32_t flags);
-void amdgpu_device_fini(struct amdgpu_device *adev);
+void amdgpu_device_fini_early(struct amdgpu_device *adev);
+void amdgpu_device_fini_late(struct amdgpu_device *adev);
+
 int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
 
 void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
@@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
 int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
 void amdgpu_driver_postclose_kms(struct drm_device *dev,
 				 struct drm_file *file_priv);
+void amdgpu_driver_release_kms(struct drm_device *dev);
+
 int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
 int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
 int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2f60b70..797d94d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  * Tear down the driver info (all asics).
  * Called at driver shutdown.
  */
-void amdgpu_device_fini(struct amdgpu_device *adev)
+void amdgpu_device_fini_early(struct amdgpu_device *adev)
 {
 	dev_info(adev->dev, "amdgpu: finishing device.\n");
 	flush_delayed_work(&adev->delayed_init_work);
 	adev->shutdown = true;
 
-	kfree(adev->pci_state);
-
 	/* make sure IB test finished before entering exclusive mode
 	 * to avoid preemption on IB test
 	 * */
@@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
 		else
 			drm_atomic_helper_shutdown(adev_to_drm(adev));
 	}
-	amdgpu_fence_driver_fini(adev);
+	amdgpu_fence_driver_fini_early(adev);
 	if (adev->pm_sysfs_en)
 		amdgpu_pm_sysfs_fini(adev);
 	amdgpu_fbdev_fini(adev);
+
+	amdgpu_irq_fini_early(adev);
+}
+
+void amdgpu_device_fini_late(struct amdgpu_device *adev)
+{
 	amdgpu_device_ip_fini(adev);
+	amdgpu_fence_driver_fini_late(adev);
 	release_firmware(adev->firmware.gpu_info_fw);
 	adev->firmware.gpu_info_fw = NULL;
 	adev->accel_working = false;
@@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
 		amdgpu_pmu_fini(adev);
 	if (adev->mman.discovery_bin)
 		amdgpu_discovery_fini(adev);
+
+	kfree(adev->pci_state);
+
 }
 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 7f98cf1..3d130fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
 {
 	struct drm_device *dev = pci_get_drvdata(pdev);
 
-#ifdef MODULE
-	if (THIS_MODULE->state != MODULE_STATE_GOING)
-#endif
-		DRM_ERROR("Hotplug removal is not supported\n");
 	drm_dev_unplug(dev);
 	amdgpu_driver_unload_kms(dev);
+
 	pci_disable_device(pdev);
-	pci_set_drvdata(pdev, NULL);
 	drm_dev_put(dev);
 }
 
@@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
 	.dumb_create = amdgpu_mode_dumb_create,
 	.dumb_map_offset = amdgpu_mode_dumb_mmap,
 	.fops = &amdgpu_driver_kms_fops,
+	.release = &amdgpu_driver_release_kms,
 
 	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
 	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index d0b0021..c123aa6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
  *
  * Tear down the fence driver for all possible rings (all asics).
  */
-void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
+void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
 {
 	unsigned i, j;
 	int r;
@@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
 		if (!ring->no_scheduler)
 			drm_sched_fini(&ring->sched);
 		del_timer_sync(&ring->fence_drv.fallback_timer);
+	}
+}
+
+void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
+{
+	unsigned int i, j;
+
+	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
+		struct amdgpu_ring *ring = adev->rings[i];
+
+		if (!ring || !ring->fence_drv.initialized)
+			continue;
+
 		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
 			dma_fence_put(ring->fence_drv.fences[j]);
 		kfree(ring->fence_drv.fences);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index 300ac73..a833197 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -49,6 +49,7 @@
 #include <drm/drm_irq.h>
 #include <drm/drm_vblank.h>
 #include <drm/amdgpu_drm.h>
+#include <drm/drm_drv.h>
 #include "amdgpu.h"
 #include "amdgpu_ih.h"
 #include "atom.h"
@@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
 	return 0;
 }
 
+
+void amdgpu_irq_fini_early(struct amdgpu_device *adev)
+{
+	if (adev->irq.installed) {
+		drm_irq_uninstall(&adev->ddev);
+		adev->irq.installed = false;
+		if (adev->irq.msi_enabled)
+			pci_free_irq_vectors(adev->pdev);
+
+		if (!amdgpu_device_has_dc_support(adev))
+			flush_work(&adev->hotplug_work);
+	}
+}
+
 /**
  * amdgpu_irq_fini - shut down interrupt handling
  *
@@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
 {
 	unsigned i, j;
 
-	if (adev->irq.installed) {
-		drm_irq_uninstall(adev_to_drm(adev));
-		adev->irq.installed = false;
-		if (adev->irq.msi_enabled)
-			pci_free_irq_vectors(adev->pdev);
-		if (!amdgpu_device_has_dc_support(adev))
-			flush_work(&adev->hotplug_work);
-	}
-
 	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
 		if (!adev->irq.client[i].sources)
 			continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
index c718e94..718c70f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
@@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
 
 int amdgpu_irq_init(struct amdgpu_device *adev);
 void amdgpu_irq_fini(struct amdgpu_device *adev);
+void amdgpu_irq_fini_early(struct amdgpu_device *adev);
 int amdgpu_irq_add_id(struct amdgpu_device *adev,
 		      unsigned client_id, unsigned src_id,
 		      struct amdgpu_irq_src *source);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index a0af8a7..9e30c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -29,6 +29,7 @@
 #include "amdgpu.h"
 #include <drm/drm_debugfs.h>
 #include <drm/amdgpu_drm.h>
+#include <drm/drm_drv.h>
 #include "amdgpu_sched.h"
 #include "amdgpu_uvd.h"
 #include "amdgpu_vce.h"
@@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
 	}
 
 	amdgpu_acpi_fini(adev);
-	amdgpu_device_fini(adev);
+	amdgpu_device_fini_early(adev);
 }
 
 void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
@@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	pm_runtime_put_autosuspend(dev->dev);
 }
 
+
+void amdgpu_driver_release_kms(struct drm_device *dev)
+{
+	struct amdgpu_device *adev = drm_to_adev(dev);
+
+	amdgpu_device_fini_late(adev);
+	pci_set_drvdata(adev->pdev, NULL);
+}
+
 /*
  * VBlank related functions.
  */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 9d11b84..caf828a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
 {
 	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
 
+	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
+
 	if (!con)
 		return 0;
 
+
 	/* Need disable ras on all IPs here before ip [hw/sw]fini */
 	amdgpu_ras_disable_all_features(adev, 0);
 	amdgpu_ras_recovery_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7112137..074f36b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
 };
 
 int amdgpu_fence_driver_init(struct amdgpu_device *adev);
-void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
+void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
+void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
 void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
 
 int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 09/12] drm/amdgpu: Add early fini callback
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Use it to call disply code dependent on device->drv_data
before it's set to NULL on device unplug

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c        | 20 ++++++++++++++++++++
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 ++++++++++--
 drivers/gpu/drm/amd/include/amd_shared.h          |  2 ++
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 797d94d..96368a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2508,6 +2508,24 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)
 	return 0;
 }
 
+static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev)
+{
+	int i, r;
+
+	for (i = 0; i < adev->num_ip_blocks; i++) {
+		if (!adev->ip_blocks[i].version->funcs->early_fini)
+			continue;
+
+		r = adev->ip_blocks[i].version->funcs->early_fini((void *)adev);
+		if (r) {
+			DRM_DEBUG("early_fini of IP block <%s> failed %d\n",
+				  adev->ip_blocks[i].version->funcs->name, r);
+		}
+	}
+
+	return 0;
+}
+
 /**
  * amdgpu_device_ip_fini - run fini for hardware IPs
  *
@@ -3585,6 +3603,8 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev)
 	amdgpu_fbdev_fini(adev);
 
 	amdgpu_irq_fini_early(adev);
+
+	amdgpu_device_ip_fini_early(adev);
 }
 
 void amdgpu_device_fini_late(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 1da4ad5..278d1f6 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1158,6 +1158,15 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
 	return -EINVAL;
 }
 
+static int amdgpu_dm_early_fini(void *handle)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+	amdgpu_dm_audio_fini(adev);
+
+	return 0;
+}
+
 static void amdgpu_dm_fini(struct amdgpu_device *adev)
 {
 	int i;
@@ -1166,8 +1175,6 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
 		drm_encoder_cleanup(&adev->dm.mst_encoders[i].base);
 	}
 
-	amdgpu_dm_audio_fini(adev);
-
 	amdgpu_dm_destroy_drm_device(&adev->dm);
 
 #ifdef CONFIG_DRM_AMD_DC_HDCP
@@ -2150,6 +2157,7 @@ static const struct amd_ip_funcs amdgpu_dm_funcs = {
 	.late_init = dm_late_init,
 	.sw_init = dm_sw_init,
 	.sw_fini = dm_sw_fini,
+	.early_fini = amdgpu_dm_early_fini,
 	.hw_init = dm_hw_init,
 	.hw_fini = dm_hw_fini,
 	.suspend = dm_suspend,
diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h
index 9676016..63bb846 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -239,6 +239,7 @@ enum amd_dpm_forced_level;
  * @late_init: sets up late driver/hw state (post hw_init) - Optional
  * @sw_init: sets up driver state, does not configure hw
  * @sw_fini: tears down driver state, does not configure hw
+ * @early_fini: tears down stuff before dev detached from driver
  * @hw_init: sets up the hw state
  * @hw_fini: tears down the hw state
  * @late_fini: final cleanup
@@ -267,6 +268,7 @@ struct amd_ip_funcs {
 	int (*late_init)(void *handle);
 	int (*sw_init)(void *handle);
 	int (*sw_fini)(void *handle);
+	int (*early_fini)(void *handle);
 	int (*hw_init)(void *handle);
 	int (*hw_fini)(void *handle);
 	void (*late_fini)(void *handle);
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 09/12] drm/amdgpu: Add early fini callback
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

Use it to call disply code dependent on device->drv_data
before it's set to NULL on device unplug

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c        | 20 ++++++++++++++++++++
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 ++++++++++--
 drivers/gpu/drm/amd/include/amd_shared.h          |  2 ++
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 797d94d..96368a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2508,6 +2508,24 @@ static int amdgpu_device_ip_late_init(struct amdgpu_device *adev)
 	return 0;
 }
 
+static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev)
+{
+	int i, r;
+
+	for (i = 0; i < adev->num_ip_blocks; i++) {
+		if (!adev->ip_blocks[i].version->funcs->early_fini)
+			continue;
+
+		r = adev->ip_blocks[i].version->funcs->early_fini((void *)adev);
+		if (r) {
+			DRM_DEBUG("early_fini of IP block <%s> failed %d\n",
+				  adev->ip_blocks[i].version->funcs->name, r);
+		}
+	}
+
+	return 0;
+}
+
 /**
  * amdgpu_device_ip_fini - run fini for hardware IPs
  *
@@ -3585,6 +3603,8 @@ void amdgpu_device_fini_early(struct amdgpu_device *adev)
 	amdgpu_fbdev_fini(adev);
 
 	amdgpu_irq_fini_early(adev);
+
+	amdgpu_device_ip_fini_early(adev);
 }
 
 void amdgpu_device_fini_late(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 1da4ad5..278d1f6 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1158,6 +1158,15 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
 	return -EINVAL;
 }
 
+static int amdgpu_dm_early_fini(void *handle)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+	amdgpu_dm_audio_fini(adev);
+
+	return 0;
+}
+
 static void amdgpu_dm_fini(struct amdgpu_device *adev)
 {
 	int i;
@@ -1166,8 +1175,6 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
 		drm_encoder_cleanup(&adev->dm.mst_encoders[i].base);
 	}
 
-	amdgpu_dm_audio_fini(adev);
-
 	amdgpu_dm_destroy_drm_device(&adev->dm);
 
 #ifdef CONFIG_DRM_AMD_DC_HDCP
@@ -2150,6 +2157,7 @@ static const struct amd_ip_funcs amdgpu_dm_funcs = {
 	.late_init = dm_late_init,
 	.sw_init = dm_sw_init,
 	.sw_fini = dm_sw_fini,
+	.early_fini = amdgpu_dm_early_fini,
 	.hw_init = dm_hw_init,
 	.hw_fini = dm_hw_fini,
 	.suspend = dm_suspend,
diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h
index 9676016..63bb846 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -239,6 +239,7 @@ enum amd_dpm_forced_level;
  * @late_init: sets up late driver/hw state (post hw_init) - Optional
  * @sw_init: sets up driver state, does not configure hw
  * @sw_fini: tears down driver state, does not configure hw
+ * @early_fini: tears down stuff before dev detached from driver
  * @hw_init: sets up the hw state
  * @hw_fini: tears down the hw state
  * @late_fini: final cleanup
@@ -267,6 +268,7 @@ struct amd_ip_funcs {
 	int (*late_init)(void *handle);
 	int (*sw_init)(void *handle);
 	int (*sw_fini)(void *handle);
+	int (*early_fini)(void *handle);
 	int (*hw_init)(void *handle);
 	int (*hw_fini)(void *handle);
 	void (*late_fini)(void *handle);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Avoids NULL ptr due to kobj->sd being unset on device removal.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index caf828a..812e592 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -27,6 +27,7 @@
 #include <linux/uaccess.h>
 #include <linux/reboot.h>
 #include <linux/syscalls.h>
+#include <drm/drm_drv.h>
 
 #include "amdgpu.h"
 #include "amdgpu_ras.h"
@@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
 		.attrs = attrs,
 	};
 
-	sysfs_remove_group(&adev->dev->kobj, &group);
+	if (!drm_dev_is_unplugged(&adev->ddev))
+		sysfs_remove_group(&adev->dev->kobj, &group);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index 2b7c90b..54331fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -24,6 +24,7 @@
 #include <linux/firmware.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <drm/drm_drv.h>
 
 #include "amdgpu.h"
 #include "amdgpu_ucode.h"
@@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
 
 void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
 {
-	sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
+	if (!drm_dev_is_unplugged(&adev->ddev))
+		sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
 }
 
 static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

Avoids NULL ptr due to kobj->sd being unset on device removal.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index caf828a..812e592 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -27,6 +27,7 @@
 #include <linux/uaccess.h>
 #include <linux/reboot.h>
 #include <linux/syscalls.h>
+#include <drm/drm_drv.h>
 
 #include "amdgpu.h"
 #include "amdgpu_ras.h"
@@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
 		.attrs = attrs,
 	};
 
-	sysfs_remove_group(&adev->dev->kobj, &group);
+	if (!drm_dev_is_unplugged(&adev->ddev))
+		sysfs_remove_group(&adev->dev->kobj, &group);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index 2b7c90b..54331fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -24,6 +24,7 @@
 #include <linux/firmware.h>
 #include <linux/slab.h>
 #include <linux/module.h>
+#include <drm/drm_drv.h>
 
 #include "amdgpu.h"
 #include "amdgpu_ucode.h"
@@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
 
 void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
 {
-	sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
+	if (!drm_dev_is_unplugged(&adev->ddev))
+		sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
 }
 
 static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 11/12] drm/amdgpu: Register IOMMU topology notifier per device.
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Handle all DMA IOMMU gropup related dependencies before the
group is removed.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  5 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 ++
 6 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6243f6d..c41957e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -51,6 +51,7 @@
 #include <linux/dma-fence.h>
 #include <linux/pci.h>
 #include <linux/aer.h>
+#include <linux/notifier.h>
 
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
@@ -1044,6 +1045,10 @@ struct amdgpu_device {
 
 	bool                            in_pci_err_recovery;
 	struct pci_saved_state          *pci_state;
+
+	struct notifier_block		nb;
+	struct blocking_notifier_head	notifier;
+	struct list_head		device_bo_list;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 96368a8..bc84c20 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -70,6 +70,8 @@
 #include <drm/task_barrier.h>
 #include <linux/pm_runtime.h>
 
+#include <linux/iommu.h>
+
 MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3179,6 +3181,39 @@ static const struct attribute *amdgpu_dev_attributes[] = {
 };
 
 
+static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
+				     unsigned long action, void *data)
+{
+	struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
+	struct amdgpu_bo *bo = NULL;
+
+	/*
+	 * Following is a set of IOMMU group dependencies taken care of before
+	 * device's IOMMU group is removed
+	 */
+	if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
+
+		spin_lock(&ttm_bo_glob.lru_lock);
+		list_for_each_entry(bo, &adev->device_bo_list, bo) {
+			if (bo->tbo.ttm)
+				ttm_tt_unpopulate(bo->tbo.ttm);
+		}
+		spin_unlock(&ttm_bo_glob.lru_lock);
+
+		if (adev->irq.ih.use_bus_addr)
+			amdgpu_ih_ring_fini(adev, &adev->irq.ih);
+		if (adev->irq.ih1.use_bus_addr)
+			amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
+		if (adev->irq.ih2.use_bus_addr)
+			amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
+
+		amdgpu_gart_dummy_page_fini(adev);
+	}
+
+	return NOTIFY_OK;
+}
+
+
 /**
  * amdgpu_device_init - initialize the driver
  *
@@ -3283,6 +3318,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 
 	INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
 
+	INIT_LIST_HEAD(&adev->device_bo_list);
+
 	adev->gfx.gfx_off_req_count = 1;
 	adev->pm.ac_power = power_supply_is_system_supplied() > 0;
 
@@ -3553,6 +3590,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	if (amdgpu_device_cache_pci_state(adev->pdev))
 		pci_restore_state(pdev);
 
+	BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
+	adev->nb.notifier_call = amdgpu_iommu_group_notifier;
+
+	if (adev->dev->iommu_group) {
+		r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
+		if (r)
+			goto failed;
+	}
+
 	return 0;
 
 failed:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index e01e681..34c17bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev)
  *
  * Frees the dummy page used by the driver (all asics).
  */
-static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
+void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
 {
 	if (!adev->dummy_page_addr)
 		return;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
index afa2e28..5678d9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
 void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
 int amdgpu_gart_init(struct amdgpu_device *adev);
 void amdgpu_gart_fini(struct amdgpu_device *adev);
+void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev);
 int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
 		       int pages);
 int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b191701..731c9889 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo)
 	}
 	amdgpu_bo_unref(&bo->parent);
 
+	spin_lock(&ttm_bo_glob.lru_lock);
+	list_del(&bo->bo);
+	spin_unlock(&ttm_bo_glob.lru_lock);
+
 	kfree(bo->metadata);
 	kfree(bo);
 }
@@ -616,6 +620,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 	if (bp->type == ttm_bo_type_device)
 		bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
 
+	INIT_LIST_HEAD(&bo->bo);
+
+	spin_lock(&ttm_bo_glob.lru_lock);
+	list_add_tail(&bo->bo, &adev->device_bo_list);
+	spin_unlock(&ttm_bo_glob.lru_lock);
+
 	return 0;
 
 fail_unreserve:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 621c0bf..b53b7e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -112,6 +112,8 @@ struct amdgpu_bo {
 	struct list_head		shadow_list;
 
 	struct kgd_mem                  *kfd_bo;
+
+	struct list_head 		bo;
 };
 
 static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 11/12] drm/amdgpu: Register IOMMU topology notifier per device.
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

Handle all DMA IOMMU gropup related dependencies before the
group is removed.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  5 ++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 46 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 +++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 ++
 6 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6243f6d..c41957e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -51,6 +51,7 @@
 #include <linux/dma-fence.h>
 #include <linux/pci.h>
 #include <linux/aer.h>
+#include <linux/notifier.h>
 
 #include <drm/ttm/ttm_bo_api.h>
 #include <drm/ttm/ttm_bo_driver.h>
@@ -1044,6 +1045,10 @@ struct amdgpu_device {
 
 	bool                            in_pci_err_recovery;
 	struct pci_saved_state          *pci_state;
+
+	struct notifier_block		nb;
+	struct blocking_notifier_head	notifier;
+	struct list_head		device_bo_list;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 96368a8..bc84c20 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -70,6 +70,8 @@
 #include <drm/task_barrier.h>
 #include <linux/pm_runtime.h>
 
+#include <linux/iommu.h>
+
 MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3179,6 +3181,39 @@ static const struct attribute *amdgpu_dev_attributes[] = {
 };
 
 
+static int amdgpu_iommu_group_notifier(struct notifier_block *nb,
+				     unsigned long action, void *data)
+{
+	struct amdgpu_device *adev = container_of(nb, struct amdgpu_device, nb);
+	struct amdgpu_bo *bo = NULL;
+
+	/*
+	 * Following is a set of IOMMU group dependencies taken care of before
+	 * device's IOMMU group is removed
+	 */
+	if (action == IOMMU_GROUP_NOTIFY_DEL_DEVICE) {
+
+		spin_lock(&ttm_bo_glob.lru_lock);
+		list_for_each_entry(bo, &adev->device_bo_list, bo) {
+			if (bo->tbo.ttm)
+				ttm_tt_unpopulate(bo->tbo.ttm);
+		}
+		spin_unlock(&ttm_bo_glob.lru_lock);
+
+		if (adev->irq.ih.use_bus_addr)
+			amdgpu_ih_ring_fini(adev, &adev->irq.ih);
+		if (adev->irq.ih1.use_bus_addr)
+			amdgpu_ih_ring_fini(adev, &adev->irq.ih1);
+		if (adev->irq.ih2.use_bus_addr)
+			amdgpu_ih_ring_fini(adev, &adev->irq.ih2);
+
+		amdgpu_gart_dummy_page_fini(adev);
+	}
+
+	return NOTIFY_OK;
+}
+
+
 /**
  * amdgpu_device_init - initialize the driver
  *
@@ -3283,6 +3318,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 
 	INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
 
+	INIT_LIST_HEAD(&adev->device_bo_list);
+
 	adev->gfx.gfx_off_req_count = 1;
 	adev->pm.ac_power = power_supply_is_system_supplied() > 0;
 
@@ -3553,6 +3590,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 	if (amdgpu_device_cache_pci_state(adev->pdev))
 		pci_restore_state(pdev);
 
+	BLOCKING_INIT_NOTIFIER_HEAD(&adev->notifier);
+	adev->nb.notifier_call = amdgpu_iommu_group_notifier;
+
+	if (adev->dev->iommu_group) {
+		r = iommu_group_register_notifier(adev->dev->iommu_group, &adev->nb);
+		if (r)
+			goto failed;
+	}
+
 	return 0;
 
 failed:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index e01e681..34c17bd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device *adev)
  *
  * Frees the dummy page used by the driver (all asics).
  */
-static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
+void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
 {
 	if (!adev->dummy_page_addr)
 		return;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
index afa2e28..5678d9c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
 void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
 int amdgpu_gart_init(struct amdgpu_device *adev);
 void amdgpu_gart_fini(struct amdgpu_device *adev);
+void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev);
 int amdgpu_gart_unbind(struct amdgpu_device *adev, uint64_t offset,
 		       int pages);
 int amdgpu_gart_map(struct amdgpu_device *adev, uint64_t offset,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b191701..731c9889 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -94,6 +94,10 @@ static void amdgpu_bo_destroy(struct ttm_buffer_object *tbo)
 	}
 	amdgpu_bo_unref(&bo->parent);
 
+	spin_lock(&ttm_bo_glob.lru_lock);
+	list_del(&bo->bo);
+	spin_unlock(&ttm_bo_glob.lru_lock);
+
 	kfree(bo->metadata);
 	kfree(bo);
 }
@@ -616,6 +620,12 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
 	if (bp->type == ttm_bo_type_device)
 		bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
 
+	INIT_LIST_HEAD(&bo->bo);
+
+	spin_lock(&ttm_bo_glob.lru_lock);
+	list_add_tail(&bo->bo, &adev->device_bo_list);
+	spin_unlock(&ttm_bo_glob.lru_lock);
+
 	return 0;
 
 fail_unreserve:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 621c0bf..b53b7e0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -112,6 +112,8 @@ struct amdgpu_bo {
 	struct list_head		shadow_list;
 
 	struct kgd_mem                  *kfd_bo;
+
+	struct list_head 		bo;
 };
 
 static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 12/12] drm/amdgpu: Fix a bunch of sdma code crash post device unplug
  2020-11-21  5:21 ` Andrey Grodzovsky
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

We can't allocate and submit IBs post device unplug.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index fdbe7d4..a62ad20 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -31,6 +31,7 @@
 #include <linux/dma-buf.h>
 
 #include <drm/amdgpu_drm.h>
+#include <drm/drm_drv.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -1602,7 +1603,10 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	struct amdgpu_vm_update_params params;
 	enum amdgpu_sync_mode sync_mode;
 	uint64_t pfn;
-	int r;
+	int r, idx;
+
+	if (!drm_dev_enter(&adev->ddev, &idx))
+		return -ENOENT;
 
 	memset(&params, 0, sizeof(params));
 	params.adev = adev;
@@ -1645,6 +1649,8 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	if (r)
 		goto error_unlock;
 
+
+	drm_dev_exit(idx);
 	do {
 		uint64_t tmp, num_entries, addr;
 
-- 
2.7.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* [PATCH v3 12/12] drm/amdgpu: Fix a bunch of sdma code crash post device unplug
@ 2020-11-21  5:21   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-21  5:21 UTC (permalink / raw)
  To: amd-gfx, dri-devel, ckoenig.leichtzumerken, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland, Andrey Grodzovsky

We can't allocate and submit IBs post device unplug.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index fdbe7d4..a62ad20 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -31,6 +31,7 @@
 #include <linux/dma-buf.h>
 
 #include <drm/amdgpu_drm.h>
+#include <drm/drm_drv.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
@@ -1602,7 +1603,10 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	struct amdgpu_vm_update_params params;
 	enum amdgpu_sync_mode sync_mode;
 	uint64_t pfn;
-	int r;
+	int r, idx;
+
+	if (!drm_dev_enter(&adev->ddev, &idx))
+		return -ENOENT;
 
 	memset(&params, 0, sizeof(params));
 	params.adev = adev;
@@ -1645,6 +1649,8 @@ static int amdgpu_vm_bo_update_mapping(struct amdgpu_device *adev,
 	if (r)
 		goto error_unlock;
 
+
+	drm_dev_exit(idx);
 	do {
 		uint64_t tmp, num_entries, addr;
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 04/12] drm/ttm: Set dma addr to null after freee
  2020-11-21  5:21   ` Andrey Grodzovsky
@ 2020-11-21 14:13     ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-21 14:13 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> Fixes oops.

That file doesn't even exist any more. What oops should this fix?

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index b40a467..b0df328 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, struct ttm_dma_tt *tt)
>   		dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE,
>   			       DMA_BIDIRECTIONAL);
>   
> +		tt->dma_address[i] = 0;
> +
>   		i += num_pages;
>   	}
>   	ttm_pool_unpopulate(&tt->ttm);

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 04/12] drm/ttm: Set dma addr to null after freee
@ 2020-11-21 14:13     ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-21 14:13 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> Fixes oops.

That file doesn't even exist any more. What oops should this fix?

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> index b40a467..b0df328 100644
> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, struct ttm_dma_tt *tt)
>   		dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE,
>   			       DMA_BIDIRECTIONAL);
>   
> +		tt->dma_address[i] = 0;
> +
>   		i += num_pages;
>   	}
>   	ttm_pool_unpopulate(&tt->ttm);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2020-11-21  5:21   ` Andrey Grodzovsky
@ 2020-11-21 14:15     ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-21 14:15 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> Will be used to reroute CPU mapped BO's page faults once
> device is removed.

Uff, one page for each exported DMA-buf? That's not something we can do.

We need to find a different approach here.

Can't we call alloc_page() on each fault and link them together so they 
are freed when the device is finally reaped?

Regards,
Christian.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>   include/drm/drm_file.h      |  2 ++
>   include/drm/drm_gem.h       |  2 ++
>   4 files changed, 22 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 0ac4566..ff3d39f 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>   			goto out_prime_destroy;
>   	}
>   
> +	file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!file->dummy_page) {
> +		ret = -ENOMEM;
> +		goto out_prime_destroy;
> +	}
> +
>   	return file;
>   
>   out_prime_destroy:
> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>   	if (dev->driver->postclose)
>   		dev->driver->postclose(dev, file);
>   
> +	__free_page(file->dummy_page);
> +
>   	drm_prime_destroy_file_private(&file->prime);
>   
>   	WARN_ON(!list_empty(&file->event_list));
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 1693aa7..987b45c 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>   
>   	ret = drm_prime_add_buf_handle(&file_priv->prime,
>   			dma_buf, *handle);
> +
> +	if (!ret) {
> +		obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +		if (!obj->dummy_page)
> +			ret = -ENOMEM;
> +	}
> +
>   	mutex_unlock(&file_priv->prime.lock);
>   	if (ret)
>   		goto fail;
> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
>   		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>   	dma_buf = attach->dmabuf;
>   	dma_buf_detach(attach->dmabuf, attach);
> +
> +	__free_page(obj->dummy_page);
> +
>   	/* remove the reference */
>   	dma_buf_put(dma_buf);
>   }
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 716990b..2a011fc 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -346,6 +346,8 @@ struct drm_file {
>   	 */
>   	struct drm_prime_file_private prime;
>   
> +	struct page *dummy_page;
> +
>   	/* private: */
>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>   	unsigned long lock_count; /* DRI1 legacy lock count */
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 337a483..76a97a3 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -311,6 +311,8 @@ struct drm_gem_object {
>   	 *
>   	 */
>   	const struct drm_gem_object_funcs *funcs;
> +
> +	struct page *dummy_page;
>   };
>   
>   /**

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2020-11-21 14:15     ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-21 14:15 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> Will be used to reroute CPU mapped BO's page faults once
> device is removed.

Uff, one page for each exported DMA-buf? That's not something we can do.

We need to find a different approach here.

Can't we call alloc_page() on each fault and link them together so they 
are freed when the device is finally reaped?

Regards,
Christian.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>   include/drm/drm_file.h      |  2 ++
>   include/drm/drm_gem.h       |  2 ++
>   4 files changed, 22 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index 0ac4566..ff3d39f 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>   			goto out_prime_destroy;
>   	}
>   
> +	file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +	if (!file->dummy_page) {
> +		ret = -ENOMEM;
> +		goto out_prime_destroy;
> +	}
> +
>   	return file;
>   
>   out_prime_destroy:
> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>   	if (dev->driver->postclose)
>   		dev->driver->postclose(dev, file);
>   
> +	__free_page(file->dummy_page);
> +
>   	drm_prime_destroy_file_private(&file->prime);
>   
>   	WARN_ON(!list_empty(&file->event_list));
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 1693aa7..987b45c 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>   
>   	ret = drm_prime_add_buf_handle(&file_priv->prime,
>   			dma_buf, *handle);
> +
> +	if (!ret) {
> +		obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> +		if (!obj->dummy_page)
> +			ret = -ENOMEM;
> +	}
> +
>   	mutex_unlock(&file_priv->prime.lock);
>   	if (ret)
>   		goto fail;
> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, struct sg_table *sg)
>   		dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>   	dma_buf = attach->dmabuf;
>   	dma_buf_detach(attach->dmabuf, attach);
> +
> +	__free_page(obj->dummy_page);
> +
>   	/* remove the reference */
>   	dma_buf_put(dma_buf);
>   }
> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> index 716990b..2a011fc 100644
> --- a/include/drm/drm_file.h
> +++ b/include/drm/drm_file.h
> @@ -346,6 +346,8 @@ struct drm_file {
>   	 */
>   	struct drm_prime_file_private prime;
>   
> +	struct page *dummy_page;
> +
>   	/* private: */
>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>   	unsigned long lock_count; /* DRI1 legacy lock count */
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 337a483..76a97a3 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -311,6 +311,8 @@ struct drm_gem_object {
>   	 *
>   	 */
>   	const struct drm_gem_object_funcs *funcs;
> +
> +	struct page *dummy_page;
>   };
>   
>   /**

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 02/12] drm: Unamp the entire device address space on device unplug
  2020-11-21  5:21   ` Andrey Grodzovsky
@ 2020-11-21 14:16     ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-21 14:16 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> Invalidate all BOs CPU mappings once device is removed.
>
> v3: Move the code from TTM into drm_dev_unplug
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/drm_drv.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 13068fd..d550fd5 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -479,6 +479,9 @@ void drm_dev_unplug(struct drm_device *dev)
>   	synchronize_srcu(&drm_unplug_srcu);
>   
>   	drm_dev_unregister(dev);
> +
> +	/* Clear all CPU mappings pointing to this device */
> +	unmap_mapping_range(dev->anon_inode->i_mapping, 0, 0, 1);
>   }
>   EXPORT_SYMBOL(drm_dev_unplug);
>   

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 02/12] drm: Unamp the entire device address space on device unplug
@ 2020-11-21 14:16     ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-21 14:16 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> Invalidate all BOs CPU mappings once device is removed.
>
> v3: Move the code from TTM into drm_dev_unplug
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/drm_drv.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> index 13068fd..d550fd5 100644
> --- a/drivers/gpu/drm/drm_drv.c
> +++ b/drivers/gpu/drm/drm_drv.c
> @@ -479,6 +479,9 @@ void drm_dev_unplug(struct drm_device *dev)
>   	synchronize_srcu(&drm_unplug_srcu);
>   
>   	drm_dev_unregister(dev);
> +
> +	/* Clear all CPU mappings pointing to this device */
> +	unmap_mapping_range(dev->anon_inode->i_mapping, 0, 0, 1);
>   }
>   EXPORT_SYMBOL(drm_dev_unplug);
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 06/12] drm/sched: Cancel and flush all oustatdning jobs before finish.
  2020-11-21  5:21   ` Andrey Grodzovsky
@ 2020-11-22 11:56     ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-22 11:56 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> To avoid any possible use after free.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index da24c4e..c3f0bd0 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -891,6 +891,9 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
>   	if (sched->thread)
>   		kthread_stop(sched->thread);
>   
> +	/* Confirm no work left behind accessing device structures */
> +	cancel_delayed_work_sync(&sched->work_tdr);
> +
>   	sched->ready = false;
>   }
>   EXPORT_SYMBOL(drm_sched_fini);

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 06/12] drm/sched: Cancel and flush all oustatdning jobs before finish.
@ 2020-11-22 11:56     ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-22 11:56 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> To avoid any possible use after free.
>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index da24c4e..c3f0bd0 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -891,6 +891,9 @@ void drm_sched_fini(struct drm_gpu_scheduler *sched)
>   	if (sched->thread)
>   		kthread_stop(sched->thread);
>   
> +	/* Confirm no work left behind accessing device structures */
> +	cancel_delayed_work_sync(&sched->work_tdr);
> +
>   	sched->ready = false;
>   }
>   EXPORT_SYMBOL(drm_sched_fini);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-21  5:21   ` Andrey Grodzovsky
@ 2020-11-22 11:57     ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-22 11:57 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> No point to try recovery if device is gone, it's meaningless.

I think that this should go into the device specific recovery function 
and not in the scheduler.

Christian.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>   drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>   drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>   drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>   drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>   include/drm/gpu_scheduler.h               |  6 +++++-
>   7 files changed, 35 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index d56f402..d0b0021 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>   
>   		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>   				   num_hw_submission, amdgpu_job_hang_limit,
> -				   timeout, ring->name);
> +				   timeout, ring->name, &adev->ddev);
>   		if (r) {
>   			DRM_ERROR("Failed to create scheduler on ring %s.\n",
>   				  ring->name);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index cd46c88..7678287 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>   
>   	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>   			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
> -			     msecs_to_jiffies(500), dev_name(gpu->dev));
> +			     msecs_to_jiffies(500), dev_name(gpu->dev),
> +			     gpu->drm);
>   	if (ret)
>   		return ret;
>   
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index dc6df9e..8a7e5d7ca 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
>   
>   	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>   			      lima_job_hang_limit, msecs_to_jiffies(timeout),
> -			      name);
> +			      name,
> +			      pipe->ldev->ddev);
>   }
>   
>   void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 30e7b71..37b03b01 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>   		ret = drm_sched_init(&js->queue[j].sched,
>   				     &panfrost_sched_ops,
>   				     1, 0, msecs_to_jiffies(500),
> -				     "pan_js");
> +				     "pan_js", pfdev->ddev);
>   		if (ret) {
>   			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
>   			goto err_sched;
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index c3f0bd0..95db8c6 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -53,6 +53,7 @@
>   #include <drm/drm_print.h>
>   #include <drm/gpu_scheduler.h>
>   #include <drm/spsc_queue.h>
> +#include <drm/drm_drv.h>
>   
>   #define CREATE_TRACE_POINTS
>   #include "gpu_scheduler_trace.h"
> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   	struct drm_gpu_scheduler *sched;
>   	struct drm_sched_job *job;
>   
> +	int idx;
> +
>   	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>   
> +	if (!drm_dev_enter(sched->ddev, &idx)) {
> +		DRM_INFO("%s - device unplugged skipping recovery on scheduler:%s",
> +			 __func__, sched->name);
> +		return;
> +	}
> +
>   	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
>   	spin_lock(&sched->job_list_lock);
>   	job = list_first_entry_or_null(&sched->ring_mirror_list,
> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   	spin_lock(&sched->job_list_lock);
>   	drm_sched_start_timeout(sched);
>   	spin_unlock(&sched->job_list_lock);
> +
> +	drm_dev_exit(idx);
>   }
>   
>    /**
> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   unsigned hw_submission,
>   		   unsigned hang_limit,
>   		   long timeout,
> -		   const char *name)
> +		   const char *name,
> +		   struct drm_device *ddev)
>   {
>   	int i, ret;
>   	sched->ops = ops;
> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   	sched->name = name;
>   	sched->timeout = timeout;
>   	sched->hang_limit = hang_limit;
> +	sched->ddev = ddev;
>   	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>   		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>   
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 0747614..f5076e5 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			     &v3d_bin_sched_ops,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms),
> -			     "v3d_bin");
> +			     "v3d_bin",
> +			     &v3d->drm);
>   	if (ret) {
>   		dev_err(v3d->drm.dev, "Failed to create bin scheduler: %d.", ret);
>   		return ret;
> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			     &v3d_render_sched_ops,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms),
> -			     "v3d_render");
> +			     "v3d_render",
> +			     &v3d->drm);
>   	if (ret) {
>   		dev_err(v3d->drm.dev, "Failed to create render scheduler: %d.",
>   			ret);
> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			     &v3d_tfu_sched_ops,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms),
> -			     "v3d_tfu");
> +			     "v3d_tfu",
> +			     &v3d->drm);
>   	if (ret) {
>   		dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>   			ret);
> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   				     &v3d_csd_sched_ops,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms),
> -				     "v3d_csd");
> +				     "v3d_csd",
> +				     &v3d->drm);
>   		if (ret) {
>   			dev_err(v3d->drm.dev, "Failed to create CSD scheduler: %d.",
>   				ret);
> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   				     &v3d_cache_clean_sched_ops,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms),
> -				     "v3d_cache_clean");
> +				     "v3d_cache_clean",
> +				     &v3d->drm);
>   		if (ret) {
>   			dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN scheduler: %d.",
>   				ret);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 9243655..a980709 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -32,6 +32,7 @@
>   
>   struct drm_gpu_scheduler;
>   struct drm_sched_rq;
> +struct drm_device;
>   
>   /* These are often used as an (initial) index
>    * to an array, and as such should start at 0.
> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>    * @score: score to help loadbalancer pick a idle sched
>    * @ready: marks if the underlying HW is ready to work
>    * @free_guilty: A hit to time out handler to free the guilty job.
> + * @ddev: Pointer to drm device of this scheduler.
>    *
>    * One scheduler is implemented for each hardware ring.
>    */
> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>   	atomic_t                        score;
>   	bool				ready;
>   	bool				free_guilty;
> +	struct drm_device		*ddev;
>   };
>   
>   int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   const struct drm_sched_backend_ops *ops,
>   		   uint32_t hw_submission, unsigned hang_limit, long timeout,
> -		   const char *name);
> +		   const char *name,
> +		   struct drm_device *ddev);
>   
>   void drm_sched_fini(struct drm_gpu_scheduler *sched);
>   int drm_sched_job_init(struct drm_sched_job *job,

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-22 11:57     ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-22 11:57 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> No point to try recovery if device is gone, it's meaningless.

I think that this should go into the device specific recovery function 
and not in the scheduler.

Christian.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>   drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>   drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>   drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>   drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>   drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>   include/drm/gpu_scheduler.h               |  6 +++++-
>   7 files changed, 35 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index d56f402..d0b0021 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>   
>   		r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>   				   num_hw_submission, amdgpu_job_hang_limit,
> -				   timeout, ring->name);
> +				   timeout, ring->name, &adev->ddev);
>   		if (r) {
>   			DRM_ERROR("Failed to create scheduler on ring %s.\n",
>   				  ring->name);
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> index cd46c88..7678287 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>   
>   	ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>   			     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
> -			     msecs_to_jiffies(500), dev_name(gpu->dev));
> +			     msecs_to_jiffies(500), dev_name(gpu->dev),
> +			     gpu->drm);
>   	if (ret)
>   		return ret;
>   
> diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c
> index dc6df9e..8a7e5d7ca 100644
> --- a/drivers/gpu/drm/lima/lima_sched.c
> +++ b/drivers/gpu/drm/lima/lima_sched.c
> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name)
>   
>   	return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>   			      lima_job_hang_limit, msecs_to_jiffies(timeout),
> -			      name);
> +			      name,
> +			      pipe->ldev->ddev);
>   }
>   
>   void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 30e7b71..37b03b01 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>   		ret = drm_sched_init(&js->queue[j].sched,
>   				     &panfrost_sched_ops,
>   				     1, 0, msecs_to_jiffies(500),
> -				     "pan_js");
> +				     "pan_js", pfdev->ddev);
>   		if (ret) {
>   			dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
>   			goto err_sched;
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index c3f0bd0..95db8c6 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -53,6 +53,7 @@
>   #include <drm/drm_print.h>
>   #include <drm/gpu_scheduler.h>
>   #include <drm/spsc_queue.h>
> +#include <drm/drm_drv.h>
>   
>   #define CREATE_TRACE_POINTS
>   #include "gpu_scheduler_trace.h"
> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   	struct drm_gpu_scheduler *sched;
>   	struct drm_sched_job *job;
>   
> +	int idx;
> +
>   	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>   
> +	if (!drm_dev_enter(sched->ddev, &idx)) {
> +		DRM_INFO("%s - device unplugged skipping recovery on scheduler:%s",
> +			 __func__, sched->name);
> +		return;
> +	}
> +
>   	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
>   	spin_lock(&sched->job_list_lock);
>   	job = list_first_entry_or_null(&sched->ring_mirror_list,
> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
>   	spin_lock(&sched->job_list_lock);
>   	drm_sched_start_timeout(sched);
>   	spin_unlock(&sched->job_list_lock);
> +
> +	drm_dev_exit(idx);
>   }
>   
>    /**
> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   unsigned hw_submission,
>   		   unsigned hang_limit,
>   		   long timeout,
> -		   const char *name)
> +		   const char *name,
> +		   struct drm_device *ddev)
>   {
>   	int i, ret;
>   	sched->ops = ops;
> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>   	sched->name = name;
>   	sched->timeout = timeout;
>   	sched->hang_limit = hang_limit;
> +	sched->ddev = ddev;
>   	for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>   		drm_sched_rq_init(sched, &sched->sched_rq[i]);
>   
> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
> index 0747614..f5076e5 100644
> --- a/drivers/gpu/drm/v3d/v3d_sched.c
> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			     &v3d_bin_sched_ops,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms),
> -			     "v3d_bin");
> +			     "v3d_bin",
> +			     &v3d->drm);
>   	if (ret) {
>   		dev_err(v3d->drm.dev, "Failed to create bin scheduler: %d.", ret);
>   		return ret;
> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			     &v3d_render_sched_ops,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms),
> -			     "v3d_render");
> +			     "v3d_render",
> +			     &v3d->drm);
>   	if (ret) {
>   		dev_err(v3d->drm.dev, "Failed to create render scheduler: %d.",
>   			ret);
> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   			     &v3d_tfu_sched_ops,
>   			     hw_jobs_limit, job_hang_limit,
>   			     msecs_to_jiffies(hang_limit_ms),
> -			     "v3d_tfu");
> +			     "v3d_tfu",
> +			     &v3d->drm);
>   	if (ret) {
>   		dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>   			ret);
> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   				     &v3d_csd_sched_ops,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms),
> -				     "v3d_csd");
> +				     "v3d_csd",
> +				     &v3d->drm);
>   		if (ret) {
>   			dev_err(v3d->drm.dev, "Failed to create CSD scheduler: %d.",
>   				ret);
> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>   				     &v3d_cache_clean_sched_ops,
>   				     hw_jobs_limit, job_hang_limit,
>   				     msecs_to_jiffies(hang_limit_ms),
> -				     "v3d_cache_clean");
> +				     "v3d_cache_clean",
> +				     &v3d->drm);
>   		if (ret) {
>   			dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN scheduler: %d.",
>   				ret);
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 9243655..a980709 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -32,6 +32,7 @@
>   
>   struct drm_gpu_scheduler;
>   struct drm_sched_rq;
> +struct drm_device;
>   
>   /* These are often used as an (initial) index
>    * to an array, and as such should start at 0.
> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>    * @score: score to help loadbalancer pick a idle sched
>    * @ready: marks if the underlying HW is ready to work
>    * @free_guilty: A hit to time out handler to free the guilty job.
> + * @ddev: Pointer to drm device of this scheduler.
>    *
>    * One scheduler is implemented for each hardware ring.
>    */
> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>   	atomic_t                        score;
>   	bool				ready;
>   	bool				free_guilty;
> +	struct drm_device		*ddev;
>   };
>   
>   int drm_sched_init(struct drm_gpu_scheduler *sched,
>   		   const struct drm_sched_backend_ops *ops,
>   		   uint32_t hw_submission, unsigned hang_limit, long timeout,
> -		   const char *name);
> +		   const char *name,
> +		   struct drm_device *ddev);
>   
>   void drm_sched_fini(struct drm_gpu_scheduler *sched);
>   int drm_sched_job_init(struct drm_sched_job *job,

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2020-11-21 14:15     ` Christian König
@ 2020-11-23  4:54       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23  4:54 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/21/20 9:15 AM, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>> Will be used to reroute CPU mapped BO's page faults once
>> device is removed.
>
> Uff, one page for each exported DMA-buf? That's not something we can do.
>
> We need to find a different approach here.
>
> Can't we call alloc_page() on each fault and link them together so they are 
> freed when the device is finally reaped?


For sure better to optimize and allocate on demand when we reach this corner 
case, but why the linking ?
Shouldn't drm_prime_gem_destroy be good enough place to free ?

Andrey


>
> Regards,
> Christian.
>
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>   include/drm/drm_file.h      |  2 ++
>>   include/drm/drm_gem.h       |  2 ++
>>   4 files changed, 22 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index 0ac4566..ff3d39f 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>               goto out_prime_destroy;
>>       }
>>   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +    if (!file->dummy_page) {
>> +        ret = -ENOMEM;
>> +        goto out_prime_destroy;
>> +    }
>> +
>>       return file;
>>     out_prime_destroy:
>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>       if (dev->driver->postclose)
>>           dev->driver->postclose(dev, file);
>>   +    __free_page(file->dummy_page);
>> +
>>       drm_prime_destroy_file_private(&file->prime);
>>         WARN_ON(!list_empty(&file->event_list));
>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>> index 1693aa7..987b45c 100644
>> --- a/drivers/gpu/drm/drm_prime.c
>> +++ b/drivers/gpu/drm/drm_prime.c
>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>         ret = drm_prime_add_buf_handle(&file_priv->prime,
>>               dma_buf, *handle);
>> +
>> +    if (!ret) {
>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +        if (!obj->dummy_page)
>> +            ret = -ENOMEM;
>> +    }
>> +
>>       mutex_unlock(&file_priv->prime.lock);
>>       if (ret)
>>           goto fail;
>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, 
>> struct sg_table *sg)
>>           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>       dma_buf = attach->dmabuf;
>>       dma_buf_detach(attach->dmabuf, attach);
>> +
>> +    __free_page(obj->dummy_page);
>> +
>>       /* remove the reference */
>>       dma_buf_put(dma_buf);
>>   }
>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>> index 716990b..2a011fc 100644
>> --- a/include/drm/drm_file.h
>> +++ b/include/drm/drm_file.h
>> @@ -346,6 +346,8 @@ struct drm_file {
>>        */
>>       struct drm_prime_file_private prime;
>>   +    struct page *dummy_page;
>> +
>>       /* private: */
>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>       unsigned long lock_count; /* DRI1 legacy lock count */
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index 337a483..76a97a3 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>        *
>>        */
>>       const struct drm_gem_object_funcs *funcs;
>> +
>> +    struct page *dummy_page;
>>   };
>>     /**
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2020-11-23  4:54       ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23  4:54 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland


On 11/21/20 9:15 AM, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>> Will be used to reroute CPU mapped BO's page faults once
>> device is removed.
>
> Uff, one page for each exported DMA-buf? That's not something we can do.
>
> We need to find a different approach here.
>
> Can't we call alloc_page() on each fault and link them together so they are 
> freed when the device is finally reaped?


For sure better to optimize and allocate on demand when we reach this corner 
case, but why the linking ?
Shouldn't drm_prime_gem_destroy be good enough place to free ?

Andrey


>
> Regards,
> Christian.
>
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>   include/drm/drm_file.h      |  2 ++
>>   include/drm/drm_gem.h       |  2 ++
>>   4 files changed, 22 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>> index 0ac4566..ff3d39f 100644
>> --- a/drivers/gpu/drm/drm_file.c
>> +++ b/drivers/gpu/drm/drm_file.c
>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>               goto out_prime_destroy;
>>       }
>>   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +    if (!file->dummy_page) {
>> +        ret = -ENOMEM;
>> +        goto out_prime_destroy;
>> +    }
>> +
>>       return file;
>>     out_prime_destroy:
>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>       if (dev->driver->postclose)
>>           dev->driver->postclose(dev, file);
>>   +    __free_page(file->dummy_page);
>> +
>>       drm_prime_destroy_file_private(&file->prime);
>>         WARN_ON(!list_empty(&file->event_list));
>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>> index 1693aa7..987b45c 100644
>> --- a/drivers/gpu/drm/drm_prime.c
>> +++ b/drivers/gpu/drm/drm_prime.c
>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>         ret = drm_prime_add_buf_handle(&file_priv->prime,
>>               dma_buf, *handle);
>> +
>> +    if (!ret) {
>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>> +        if (!obj->dummy_page)
>> +            ret = -ENOMEM;
>> +    }
>> +
>>       mutex_unlock(&file_priv->prime.lock);
>>       if (ret)
>>           goto fail;
>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct drm_gem_object *obj, 
>> struct sg_table *sg)
>>           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>       dma_buf = attach->dmabuf;
>>       dma_buf_detach(attach->dmabuf, attach);
>> +
>> +    __free_page(obj->dummy_page);
>> +
>>       /* remove the reference */
>>       dma_buf_put(dma_buf);
>>   }
>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>> index 716990b..2a011fc 100644
>> --- a/include/drm/drm_file.h
>> +++ b/include/drm/drm_file.h
>> @@ -346,6 +346,8 @@ struct drm_file {
>>        */
>>       struct drm_prime_file_private prime;
>>   +    struct page *dummy_page;
>> +
>>       /* private: */
>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>       unsigned long lock_count; /* DRI1 legacy lock count */
>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>> index 337a483..76a97a3 100644
>> --- a/include/drm/drm_gem.h
>> +++ b/include/drm/drm_gem.h
>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>        *
>>        */
>>       const struct drm_gem_object_funcs *funcs;
>> +
>> +    struct page *dummy_page;
>>   };
>>     /**
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 04/12] drm/ttm: Set dma addr to null after freee
  2020-11-21 14:13     ` Christian König
@ 2020-11-23  5:15       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23  5:15 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/21/20 9:13 AM, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>> Fixes oops.
>
> That file doesn't even exist any more. What oops should this fix?


Which file ?
We set dma_address to NULL in every other place after unmap. This is so that
if dma address was already unmapped we skip it next time we enter 
ttm_unmap_and_unpopulate_pages
with same tt for some reason.
The oops happens with IOMMU enabled. The device is removed from it's IOMMU group
during PCI remove but the BOs are all still alive if user mode client holds 
reference to drm file.
Later when the refernece is droppped and device fini happens i get oops in
ttm_unmap_and_unpopulate_pages->dma_unmap_page becaue of IOMMU group structures 
being gone already.
Patch  [11/12] drm/amdgpu: Register IOMMU topology notifier per device together 
with this patch solve the oops.

Andrey


>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
>> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> index b40a467..b0df328 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, 
>> struct ttm_dma_tt *tt)
>>           dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE,
>>                      DMA_BIDIRECTIONAL);
>>   +        tt->dma_address[i] = 0;
>> +
>>           i += num_pages;
>>       }
>>       ttm_pool_unpopulate(&tt->ttm);
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c70eb602a49497aff3508d88e27ad1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415648381338288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p8HjrEfydKrspsFCp1v8KCdT6lKr1OEKXdF3%2BSoh4zk%3D&amp;reserved=0 
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 04/12] drm/ttm: Set dma addr to null after freee
@ 2020-11-23  5:15       ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23  5:15 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland


On 11/21/20 9:13 AM, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>> Fixes oops.
>
> That file doesn't even exist any more. What oops should this fix?


Which file ?
We set dma_address to NULL in every other place after unmap. This is so that
if dma address was already unmapped we skip it next time we enter 
ttm_unmap_and_unpopulate_pages
with same tt for some reason.
The oops happens with IOMMU enabled. The device is removed from it's IOMMU group
during PCI remove but the BOs are all still alive if user mode client holds 
reference to drm file.
Later when the refernece is droppped and device fini happens i get oops in
ttm_unmap_and_unpopulate_pages->dma_unmap_page becaue of IOMMU group structures 
being gone already.
Patch  [11/12] drm/amdgpu: Register IOMMU topology notifier per device together 
with this patch solve the oops.

Andrey


>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
>> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> index b40a467..b0df328 100644
>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct device *dev, 
>> struct ttm_dma_tt *tt)
>>           dma_unmap_page(dev, tt->dma_address[i], num_pages * PAGE_SIZE,
>>                      DMA_BIDIRECTIONAL);
>>   +        tt->dma_address[i] = 0;
>> +
>>           i += num_pages;
>>       }
>>       ttm_pool_unpopulate(&tt->ttm);
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c70eb602a49497aff3508d88e27ad1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415648381338288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p8HjrEfydKrspsFCp1v8KCdT6lKr1OEKXdF3%2BSoh4zk%3D&amp;reserved=0 
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-22 11:57     ` Christian König
@ 2020-11-23  5:37       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23  5:37 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/22/20 6:57 AM, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>> No point to try recovery if device is gone, it's meaningless.
>
> I think that this should go into the device specific recovery function and not 
> in the scheduler.


The timeout timer is rearmed here, so this prevents any new recovery work to 
restart from here
after drm_dev_unplug was executed from amdgpu_pci_remove.It will not cover other 
places like
job cleanup or starting new job but those should stop once the scheduler thread 
is stopped later.

Andrey


>
> Christian.
>
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>   drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>   drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>   drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>   drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>   drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>   include/drm/gpu_scheduler.h               |  6 +++++-
>>   7 files changed, 35 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index d56f402..d0b0021 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>>             r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>                      num_hw_submission, amdgpu_job_hang_limit,
>> -                   timeout, ring->name);
>> +                   timeout, ring->name, &adev->ddev);
>>           if (r) {
>>               DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>                     ring->name);
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> index cd46c88..7678287 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>         ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>                    etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>> +                 gpu->drm);
>>       if (ret)
>>           return ret;
>>   diff --git a/drivers/gpu/drm/lima/lima_sched.c 
>> b/drivers/gpu/drm/lima/lima_sched.c
>> index dc6df9e..8a7e5d7ca 100644
>> --- a/drivers/gpu/drm/lima/lima_sched.c
>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, 
>> const char *name)
>>         return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>                     lima_job_hang_limit, msecs_to_jiffies(timeout),
>> -                  name);
>> +                  name,
>> +                  pipe->ldev->ddev);
>>   }
>>     void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 30e7b71..37b03b01 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>>           ret = drm_sched_init(&js->queue[j].sched,
>>                        &panfrost_sched_ops,
>>                        1, 0, msecs_to_jiffies(500),
>> -                     "pan_js");
>> +                     "pan_js", pfdev->ddev);
>>           if (ret) {
>>               dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
>>               goto err_sched;
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index c3f0bd0..95db8c6 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -53,6 +53,7 @@
>>   #include <drm/drm_print.h>
>>   #include <drm/gpu_scheduler.h>
>>   #include <drm/spsc_queue.h>
>> +#include <drm/drm_drv.h>
>>     #define CREATE_TRACE_POINTS
>>   #include "gpu_scheduler_trace.h"
>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct work_struct 
>> *work)
>>       struct drm_gpu_scheduler *sched;
>>       struct drm_sched_job *job;
>>   +    int idx;
>> +
>>       sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>>   +    if (!drm_dev_enter(sched->ddev, &idx)) {
>> +        DRM_INFO("%s - device unplugged skipping recovery on scheduler:%s",
>> +             __func__, sched->name);
>> +        return;
>> +    }
>> +
>>       /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
>>       spin_lock(&sched->job_list_lock);
>>       job = list_first_entry_or_null(&sched->ring_mirror_list,
>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>       spin_lock(&sched->job_list_lock);
>>       drm_sched_start_timeout(sched);
>>       spin_unlock(&sched->job_list_lock);
>> +
>> +    drm_dev_exit(idx);
>>   }
>>      /**
>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>              unsigned hw_submission,
>>              unsigned hang_limit,
>>              long timeout,
>> -           const char *name)
>> +           const char *name,
>> +           struct drm_device *ddev)
>>   {
>>       int i, ret;
>>       sched->ops = ops;
>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>       sched->name = name;
>>       sched->timeout = timeout;
>>       sched->hang_limit = hang_limit;
>> +    sched->ddev = ddev;
>>       for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>>           drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>   diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
>> index 0747614..f5076e5 100644
>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                    &v3d_bin_sched_ops,
>>                    hw_jobs_limit, job_hang_limit,
>>                    msecs_to_jiffies(hang_limit_ms),
>> -                 "v3d_bin");
>> +                 "v3d_bin",
>> +                 &v3d->drm);
>>       if (ret) {
>>           dev_err(v3d->drm.dev, "Failed to create bin scheduler: %d.", ret);
>>           return ret;
>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                    &v3d_render_sched_ops,
>>                    hw_jobs_limit, job_hang_limit,
>>                    msecs_to_jiffies(hang_limit_ms),
>> -                 "v3d_render");
>> +                 "v3d_render",
>> +                 &v3d->drm);
>>       if (ret) {
>>           dev_err(v3d->drm.dev, "Failed to create render scheduler: %d.",
>>               ret);
>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                    &v3d_tfu_sched_ops,
>>                    hw_jobs_limit, job_hang_limit,
>>                    msecs_to_jiffies(hang_limit_ms),
>> -                 "v3d_tfu");
>> +                 "v3d_tfu",
>> +                 &v3d->drm);
>>       if (ret) {
>>           dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>               ret);
>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                        &v3d_csd_sched_ops,
>>                        hw_jobs_limit, job_hang_limit,
>>                        msecs_to_jiffies(hang_limit_ms),
>> -                     "v3d_csd");
>> +                     "v3d_csd",
>> +                     &v3d->drm);
>>           if (ret) {
>>               dev_err(v3d->drm.dev, "Failed to create CSD scheduler: %d.",
>>                   ret);
>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                        &v3d_cache_clean_sched_ops,
>>                        hw_jobs_limit, job_hang_limit,
>>                        msecs_to_jiffies(hang_limit_ms),
>> -                     "v3d_cache_clean");
>> +                     "v3d_cache_clean",
>> +                     &v3d->drm);
>>           if (ret) {
>>               dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN scheduler: 
>> %d.",
>>                   ret);
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index 9243655..a980709 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -32,6 +32,7 @@
>>     struct drm_gpu_scheduler;
>>   struct drm_sched_rq;
>> +struct drm_device;
>>     /* These are often used as an (initial) index
>>    * to an array, and as such should start at 0.
>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>    * @score: score to help loadbalancer pick a idle sched
>>    * @ready: marks if the underlying HW is ready to work
>>    * @free_guilty: A hit to time out handler to free the guilty job.
>> + * @ddev: Pointer to drm device of this scheduler.
>>    *
>>    * One scheduler is implemented for each hardware ring.
>>    */
>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>       atomic_t                        score;
>>       bool                ready;
>>       bool                free_guilty;
>> +    struct drm_device        *ddev;
>>   };
>>     int drm_sched_init(struct drm_gpu_scheduler *sched,
>>              const struct drm_sched_backend_ops *ops,
>>              uint32_t hw_submission, unsigned hang_limit, long timeout,
>> -           const char *name);
>> +           const char *name,
>> +           struct drm_device *ddev);
>>     void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>   int drm_sched_job_init(struct drm_sched_job *job,
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-23  5:37       ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23  5:37 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland


On 11/22/20 6:57 AM, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>> No point to try recovery if device is gone, it's meaningless.
>
> I think that this should go into the device specific recovery function and not 
> in the scheduler.


The timeout timer is rearmed here, so this prevents any new recovery work to 
restart from here
after drm_dev_unplug was executed from amdgpu_pci_remove.It will not cover other 
places like
job cleanup or starting new job but those should stop once the scheduler thread 
is stopped later.

Andrey


>
> Christian.
>
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>   drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>   drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>   drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>   drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>   drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>   include/drm/gpu_scheduler.h               |  6 +++++-
>>   7 files changed, 35 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index d56f402..d0b0021 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>>             r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>                      num_hw_submission, amdgpu_job_hang_limit,
>> -                   timeout, ring->name);
>> +                   timeout, ring->name, &adev->ddev);
>>           if (r) {
>>               DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>                     ring->name);
>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> index cd46c88..7678287 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>         ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>                    etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>> +                 gpu->drm);
>>       if (ret)
>>           return ret;
>>   diff --git a/drivers/gpu/drm/lima/lima_sched.c 
>> b/drivers/gpu/drm/lima/lima_sched.c
>> index dc6df9e..8a7e5d7ca 100644
>> --- a/drivers/gpu/drm/lima/lima_sched.c
>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, 
>> const char *name)
>>         return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>                     lima_job_hang_limit, msecs_to_jiffies(timeout),
>> -                  name);
>> +                  name,
>> +                  pipe->ldev->ddev);
>>   }
>>     void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>> index 30e7b71..37b03b01 100644
>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device *pfdev)
>>           ret = drm_sched_init(&js->queue[j].sched,
>>                        &panfrost_sched_ops,
>>                        1, 0, msecs_to_jiffies(500),
>> -                     "pan_js");
>> +                     "pan_js", pfdev->ddev);
>>           if (ret) {
>>               dev_err(pfdev->dev, "Failed to create scheduler: %d.", ret);
>>               goto err_sched;
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index c3f0bd0..95db8c6 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -53,6 +53,7 @@
>>   #include <drm/drm_print.h>
>>   #include <drm/gpu_scheduler.h>
>>   #include <drm/spsc_queue.h>
>> +#include <drm/drm_drv.h>
>>     #define CREATE_TRACE_POINTS
>>   #include "gpu_scheduler_trace.h"
>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct work_struct 
>> *work)
>>       struct drm_gpu_scheduler *sched;
>>       struct drm_sched_job *job;
>>   +    int idx;
>> +
>>       sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
>>   +    if (!drm_dev_enter(sched->ddev, &idx)) {
>> +        DRM_INFO("%s - device unplugged skipping recovery on scheduler:%s",
>> +             __func__, sched->name);
>> +        return;
>> +    }
>> +
>>       /* Protects against concurrent deletion in drm_sched_get_cleanup_job */
>>       spin_lock(&sched->job_list_lock);
>>       job = list_first_entry_or_null(&sched->ring_mirror_list,
>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct work_struct *work)
>>       spin_lock(&sched->job_list_lock);
>>       drm_sched_start_timeout(sched);
>>       spin_unlock(&sched->job_list_lock);
>> +
>> +    drm_dev_exit(idx);
>>   }
>>      /**
>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>              unsigned hw_submission,
>>              unsigned hang_limit,
>>              long timeout,
>> -           const char *name)
>> +           const char *name,
>> +           struct drm_device *ddev)
>>   {
>>       int i, ret;
>>       sched->ops = ops;
>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>       sched->name = name;
>>       sched->timeout = timeout;
>>       sched->hang_limit = hang_limit;
>> +    sched->ddev = ddev;
>>       for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; i++)
>>           drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>   diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c
>> index 0747614..f5076e5 100644
>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                    &v3d_bin_sched_ops,
>>                    hw_jobs_limit, job_hang_limit,
>>                    msecs_to_jiffies(hang_limit_ms),
>> -                 "v3d_bin");
>> +                 "v3d_bin",
>> +                 &v3d->drm);
>>       if (ret) {
>>           dev_err(v3d->drm.dev, "Failed to create bin scheduler: %d.", ret);
>>           return ret;
>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                    &v3d_render_sched_ops,
>>                    hw_jobs_limit, job_hang_limit,
>>                    msecs_to_jiffies(hang_limit_ms),
>> -                 "v3d_render");
>> +                 "v3d_render",
>> +                 &v3d->drm);
>>       if (ret) {
>>           dev_err(v3d->drm.dev, "Failed to create render scheduler: %d.",
>>               ret);
>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                    &v3d_tfu_sched_ops,
>>                    hw_jobs_limit, job_hang_limit,
>>                    msecs_to_jiffies(hang_limit_ms),
>> -                 "v3d_tfu");
>> +                 "v3d_tfu",
>> +                 &v3d->drm);
>>       if (ret) {
>>           dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>               ret);
>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                        &v3d_csd_sched_ops,
>>                        hw_jobs_limit, job_hang_limit,
>>                        msecs_to_jiffies(hang_limit_ms),
>> -                     "v3d_csd");
>> +                     "v3d_csd",
>> +                     &v3d->drm);
>>           if (ret) {
>>               dev_err(v3d->drm.dev, "Failed to create CSD scheduler: %d.",
>>                   ret);
>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>                        &v3d_cache_clean_sched_ops,
>>                        hw_jobs_limit, job_hang_limit,
>>                        msecs_to_jiffies(hang_limit_ms),
>> -                     "v3d_cache_clean");
>> +                     "v3d_cache_clean",
>> +                     &v3d->drm);
>>           if (ret) {
>>               dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN scheduler: 
>> %d.",
>>                   ret);
>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>> index 9243655..a980709 100644
>> --- a/include/drm/gpu_scheduler.h
>> +++ b/include/drm/gpu_scheduler.h
>> @@ -32,6 +32,7 @@
>>     struct drm_gpu_scheduler;
>>   struct drm_sched_rq;
>> +struct drm_device;
>>     /* These are often used as an (initial) index
>>    * to an array, and as such should start at 0.
>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>    * @score: score to help loadbalancer pick a idle sched
>>    * @ready: marks if the underlying HW is ready to work
>>    * @free_guilty: A hit to time out handler to free the guilty job.
>> + * @ddev: Pointer to drm device of this scheduler.
>>    *
>>    * One scheduler is implemented for each hardware ring.
>>    */
>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>       atomic_t                        score;
>>       bool                ready;
>>       bool                free_guilty;
>> +    struct drm_device        *ddev;
>>   };
>>     int drm_sched_init(struct drm_gpu_scheduler *sched,
>>              const struct drm_sched_backend_ops *ops,
>>              uint32_t hw_submission, unsigned hang_limit, long timeout,
>> -           const char *name);
>> +           const char *name,
>> +           struct drm_device *ddev);
>>     void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>   int drm_sched_job_init(struct drm_sched_job *job,
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2020-11-23  4:54       ` Andrey Grodzovsky
@ 2020-11-23  8:01         ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23  8:01 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>
> On 11/21/20 9:15 AM, Christian König wrote:
>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>> Will be used to reroute CPU mapped BO's page faults once
>>> device is removed.
>>
>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>
>> We need to find a different approach here.
>>
>> Can't we call alloc_page() on each fault and link them together so 
>> they are freed when the device is finally reaped?
>
>
> For sure better to optimize and allocate on demand when we reach this 
> corner case, but why the linking ?
> Shouldn't drm_prime_gem_destroy be good enough place to free ?

I want to avoid keeping the page in the GEM object.

What we can do is to allocate a page on demand for each fault and link 
the together in the bdev instead.

And when the bdev is then finally destroyed after the last application 
closed we can finally release all of them.

Christian.

>
> Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>   include/drm/drm_file.h      |  2 ++
>>>   include/drm/drm_gem.h       |  2 ++
>>>   4 files changed, 22 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>> index 0ac4566..ff3d39f 100644
>>> --- a/drivers/gpu/drm/drm_file.c
>>> +++ b/drivers/gpu/drm/drm_file.c
>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct 
>>> drm_minor *minor)
>>>               goto out_prime_destroy;
>>>       }
>>>   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +    if (!file->dummy_page) {
>>> +        ret = -ENOMEM;
>>> +        goto out_prime_destroy;
>>> +    }
>>> +
>>>       return file;
>>>     out_prime_destroy:
>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>       if (dev->driver->postclose)
>>>           dev->driver->postclose(dev, file);
>>>   +    __free_page(file->dummy_page);
>>> +
>>>       drm_prime_destroy_file_private(&file->prime);
>>>         WARN_ON(!list_empty(&file->event_list));
>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>> index 1693aa7..987b45c 100644
>>> --- a/drivers/gpu/drm/drm_prime.c
>>> +++ b/drivers/gpu/drm/drm_prime.c
>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct 
>>> drm_device *dev,
>>>         ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>               dma_buf, *handle);
>>> +
>>> +    if (!ret) {
>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +        if (!obj->dummy_page)
>>> +            ret = -ENOMEM;
>>> +    }
>>> +
>>>       mutex_unlock(&file_priv->prime.lock);
>>>       if (ret)
>>>           goto fail;
>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct 
>>> drm_gem_object *obj, struct sg_table *sg)
>>>           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>       dma_buf = attach->dmabuf;
>>>       dma_buf_detach(attach->dmabuf, attach);
>>> +
>>> +    __free_page(obj->dummy_page);
>>> +
>>>       /* remove the reference */
>>>       dma_buf_put(dma_buf);
>>>   }
>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>> index 716990b..2a011fc 100644
>>> --- a/include/drm/drm_file.h
>>> +++ b/include/drm/drm_file.h
>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>        */
>>>       struct drm_prime_file_private prime;
>>>   +    struct page *dummy_page;
>>> +
>>>       /* private: */
>>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>       unsigned long lock_count; /* DRI1 legacy lock count */
>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>> index 337a483..76a97a3 100644
>>> --- a/include/drm/drm_gem.h
>>> +++ b/include/drm/drm_gem.h
>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>        *
>>>        */
>>>       const struct drm_gem_object_funcs *funcs;
>>> +
>>> +    struct page *dummy_page;
>>>   };
>>>     /**
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2020-11-23  8:01         ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23  8:01 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>
> On 11/21/20 9:15 AM, Christian König wrote:
>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>> Will be used to reroute CPU mapped BO's page faults once
>>> device is removed.
>>
>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>
>> We need to find a different approach here.
>>
>> Can't we call alloc_page() on each fault and link them together so 
>> they are freed when the device is finally reaped?
>
>
> For sure better to optimize and allocate on demand when we reach this 
> corner case, but why the linking ?
> Shouldn't drm_prime_gem_destroy be good enough place to free ?

I want to avoid keeping the page in the GEM object.

What we can do is to allocate a page on demand for each fault and link 
the together in the bdev instead.

And when the bdev is then finally destroyed after the last application 
closed we can finally release all of them.

Christian.

>
> Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>   include/drm/drm_file.h      |  2 ++
>>>   include/drm/drm_gem.h       |  2 ++
>>>   4 files changed, 22 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>> index 0ac4566..ff3d39f 100644
>>> --- a/drivers/gpu/drm/drm_file.c
>>> +++ b/drivers/gpu/drm/drm_file.c
>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct 
>>> drm_minor *minor)
>>>               goto out_prime_destroy;
>>>       }
>>>   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +    if (!file->dummy_page) {
>>> +        ret = -ENOMEM;
>>> +        goto out_prime_destroy;
>>> +    }
>>> +
>>>       return file;
>>>     out_prime_destroy:
>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>       if (dev->driver->postclose)
>>>           dev->driver->postclose(dev, file);
>>>   +    __free_page(file->dummy_page);
>>> +
>>>       drm_prime_destroy_file_private(&file->prime);
>>>         WARN_ON(!list_empty(&file->event_list));
>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>> index 1693aa7..987b45c 100644
>>> --- a/drivers/gpu/drm/drm_prime.c
>>> +++ b/drivers/gpu/drm/drm_prime.c
>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct 
>>> drm_device *dev,
>>>         ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>               dma_buf, *handle);
>>> +
>>> +    if (!ret) {
>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>> +        if (!obj->dummy_page)
>>> +            ret = -ENOMEM;
>>> +    }
>>> +
>>>       mutex_unlock(&file_priv->prime.lock);
>>>       if (ret)
>>>           goto fail;
>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct 
>>> drm_gem_object *obj, struct sg_table *sg)
>>>           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>       dma_buf = attach->dmabuf;
>>>       dma_buf_detach(attach->dmabuf, attach);
>>> +
>>> +    __free_page(obj->dummy_page);
>>> +
>>>       /* remove the reference */
>>>       dma_buf_put(dma_buf);
>>>   }
>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>> index 716990b..2a011fc 100644
>>> --- a/include/drm/drm_file.h
>>> +++ b/include/drm/drm_file.h
>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>        */
>>>       struct drm_prime_file_private prime;
>>>   +    struct page *dummy_page;
>>> +
>>>       /* private: */
>>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>       unsigned long lock_count; /* DRI1 legacy lock count */
>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>> index 337a483..76a97a3 100644
>>> --- a/include/drm/drm_gem.h
>>> +++ b/include/drm/drm_gem.h
>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>        *
>>>        */
>>>       const struct drm_gem_object_funcs *funcs;
>>> +
>>> +    struct page *dummy_page;
>>>   };
>>>     /**
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 04/12] drm/ttm: Set dma addr to null after freee
  2020-11-23  5:15       ` Andrey Grodzovsky
@ 2020-11-23  8:04         ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23  8:04 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 23.11.20 um 06:15 schrieb Andrey Grodzovsky:
>
> On 11/21/20 9:13 AM, Christian König wrote:
>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>> Fixes oops.
>>
>> That file doesn't even exist any more. What oops should this fix?
>
>
> Which file ?

ttm_page_alloc.c

I've rewritten the whole page pool from scratch upstream.

> We set dma_address to NULL in every other place after unmap. This is 
> so that
> if dma address was already unmapped we skip it next time we enter 
> ttm_unmap_and_unpopulate_pages
> with same tt for some reason.

Dave and I already fixed that as well by having a flag preventing double 
unpopulate.

> The oops happens with IOMMU enabled. The device is removed from it's 
> IOMMU group
> during PCI remove but the BOs are all still alive if user mode client 
> holds reference to drm file.
> Later when the refernece is droppped and device fini happens i get 
> oops in
> ttm_unmap_and_unpopulate_pages->dma_unmap_page becaue of IOMMU group 
> structures being gone already.
> Patch  [11/12] drm/amdgpu: Register IOMMU topology notifier per device 
> together with this patch solve the oops.

It should be sufficient to unpopulate all BOs now.

Maybe you should rebase the patches on drm-misc-next.

Christian.

>
> Andrey
>
>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
>>> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> index b40a467..b0df328 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct 
>>> device *dev, struct ttm_dma_tt *tt)
>>>           dma_unmap_page(dev, tt->dma_address[i], num_pages * 
>>> PAGE_SIZE,
>>>                      DMA_BIDIRECTIONAL);
>>>   +        tt->dma_address[i] = 0;
>>> +
>>>           i += num_pages;
>>>       }
>>>       ttm_pool_unpopulate(&tt->ttm);
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c70eb602a49497aff3508d88e27ad1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415648381338288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p8HjrEfydKrspsFCp1v8KCdT6lKr1OEKXdF3%2BSoh4zk%3D&amp;reserved=0 
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 04/12] drm/ttm: Set dma addr to null after freee
@ 2020-11-23  8:04         ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23  8:04 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 23.11.20 um 06:15 schrieb Andrey Grodzovsky:
>
> On 11/21/20 9:13 AM, Christian König wrote:
>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>> Fixes oops.
>>
>> That file doesn't even exist any more. What oops should this fix?
>
>
> Which file ?

ttm_page_alloc.c

I've rewritten the whole page pool from scratch upstream.

> We set dma_address to NULL in every other place after unmap. This is 
> so that
> if dma address was already unmapped we skip it next time we enter 
> ttm_unmap_and_unpopulate_pages
> with same tt for some reason.

Dave and I already fixed that as well by having a flag preventing double 
unpopulate.

> The oops happens with IOMMU enabled. The device is removed from it's 
> IOMMU group
> during PCI remove but the BOs are all still alive if user mode client 
> holds reference to drm file.
> Later when the refernece is droppped and device fini happens i get 
> oops in
> ttm_unmap_and_unpopulate_pages->dma_unmap_page becaue of IOMMU group 
> structures being gone already.
> Patch  [11/12] drm/amdgpu: Register IOMMU topology notifier per device 
> together with this patch solve the oops.

It should be sufficient to unpopulate all BOs now.

Maybe you should rebase the patches on drm-misc-next.

Christian.

>
> Andrey
>
>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c 
>>> b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> index b40a467..b0df328 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c
>>> @@ -1160,6 +1160,8 @@ void ttm_unmap_and_unpopulate_pages(struct 
>>> device *dev, struct ttm_dma_tt *tt)
>>>           dma_unmap_page(dev, tt->dma_address[i], num_pages * 
>>> PAGE_SIZE,
>>>                      DMA_BIDIRECTIONAL);
>>>   +        tt->dma_address[i] = 0;
>>> +
>>>           i += num_pages;
>>>       }
>>>       ttm_pool_unpopulate(&tt->ttm);
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C1c70eb602a49497aff3508d88e27ad1a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637415648381338288%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=p8HjrEfydKrspsFCp1v8KCdT6lKr1OEKXdF3%2BSoh4zk%3D&amp;reserved=0 
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-23  5:37       ` Andrey Grodzovsky
@ 2020-11-23  8:06         ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23  8:06 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>
> On 11/22/20 6:57 AM, Christian König wrote:
>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>> No point to try recovery if device is gone, it's meaningless.
>>
>> I think that this should go into the device specific recovery 
>> function and not in the scheduler.
>
>
> The timeout timer is rearmed here, so this prevents any new recovery 
> work to restart from here
> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not 
> cover other places like
> job cleanup or starting new job but those should stop once the 
> scheduler thread is stopped later.

Yeah, but this is rather unclean. We should probably return an error 
code instead if the timer should be rearmed or not.

Christian.

>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>   drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>   drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>   drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>   drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>   drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>   include/drm/gpu_scheduler.h               |  6 +++++-
>>>   7 files changed, 35 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index d56f402..d0b0021 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct 
>>> amdgpu_ring *ring,
>>>             r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>                      num_hw_submission, amdgpu_job_hang_limit,
>>> -                   timeout, ring->name);
>>> +                   timeout, ring->name, &adev->ddev);
>>>           if (r) {
>>>               DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>                     ring->name);
>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> index cd46c88..7678287 100644
>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>         ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>                    etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>> +                 gpu->drm);
>>>       if (ret)
>>>           return ret;
>>>   diff --git a/drivers/gpu/drm/lima/lima_sched.c 
>>> b/drivers/gpu/drm/lima/lima_sched.c
>>> index dc6df9e..8a7e5d7ca 100644
>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe 
>>> *pipe, const char *name)
>>>         return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>                     lima_job_hang_limit, msecs_to_jiffies(timeout),
>>> -                  name);
>>> +                  name,
>>> +                  pipe->ldev->ddev);
>>>   }
>>>     void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> index 30e7b71..37b03b01 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device 
>>> *pfdev)
>>>           ret = drm_sched_init(&js->queue[j].sched,
>>>                        &panfrost_sched_ops,
>>>                        1, 0, msecs_to_jiffies(500),
>>> -                     "pan_js");
>>> +                     "pan_js", pfdev->ddev);
>>>           if (ret) {
>>>               dev_err(pfdev->dev, "Failed to create scheduler: %d.", 
>>> ret);
>>>               goto err_sched;
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index c3f0bd0..95db8c6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -53,6 +53,7 @@
>>>   #include <drm/drm_print.h>
>>>   #include <drm/gpu_scheduler.h>
>>>   #include <drm/spsc_queue.h>
>>> +#include <drm/drm_drv.h>
>>>     #define CREATE_TRACE_POINTS
>>>   #include "gpu_scheduler_trace.h"
>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct 
>>> work_struct *work)
>>>       struct drm_gpu_scheduler *sched;
>>>       struct drm_sched_job *job;
>>>   +    int idx;
>>> +
>>>       sched = container_of(work, struct drm_gpu_scheduler, 
>>> work_tdr.work);
>>>   +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>> +        DRM_INFO("%s - device unplugged skipping recovery on 
>>> scheduler:%s",
>>> +             __func__, sched->name);
>>> +        return;
>>> +    }
>>> +
>>>       /* Protects against concurrent deletion in 
>>> drm_sched_get_cleanup_job */
>>>       spin_lock(&sched->job_list_lock);
>>>       job = list_first_entry_or_null(&sched->ring_mirror_list,
>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct 
>>> work_struct *work)
>>>       spin_lock(&sched->job_list_lock);
>>>       drm_sched_start_timeout(sched);
>>>       spin_unlock(&sched->job_list_lock);
>>> +
>>> +    drm_dev_exit(idx);
>>>   }
>>>      /**
>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>              unsigned hw_submission,
>>>              unsigned hang_limit,
>>>              long timeout,
>>> -           const char *name)
>>> +           const char *name,
>>> +           struct drm_device *ddev)
>>>   {
>>>       int i, ret;
>>>       sched->ops = ops;
>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>       sched->name = name;
>>>       sched->timeout = timeout;
>>>       sched->hang_limit = hang_limit;
>>> +    sched->ddev = ddev;
>>>       for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; 
>>> i++)
>>>           drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>   diff --git a/drivers/gpu/drm/v3d/v3d_sched.c 
>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>> index 0747614..f5076e5 100644
>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                    &v3d_bin_sched_ops,
>>>                    hw_jobs_limit, job_hang_limit,
>>>                    msecs_to_jiffies(hang_limit_ms),
>>> -                 "v3d_bin");
>>> +                 "v3d_bin",
>>> +                 &v3d->drm);
>>>       if (ret) {
>>>           dev_err(v3d->drm.dev, "Failed to create bin scheduler: 
>>> %d.", ret);
>>>           return ret;
>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                    &v3d_render_sched_ops,
>>>                    hw_jobs_limit, job_hang_limit,
>>>                    msecs_to_jiffies(hang_limit_ms),
>>> -                 "v3d_render");
>>> +                 "v3d_render",
>>> +                 &v3d->drm);
>>>       if (ret) {
>>>           dev_err(v3d->drm.dev, "Failed to create render scheduler: 
>>> %d.",
>>>               ret);
>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                    &v3d_tfu_sched_ops,
>>>                    hw_jobs_limit, job_hang_limit,
>>>                    msecs_to_jiffies(hang_limit_ms),
>>> -                 "v3d_tfu");
>>> +                 "v3d_tfu",
>>> +                 &v3d->drm);
>>>       if (ret) {
>>>           dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>               ret);
>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                        &v3d_csd_sched_ops,
>>>                        hw_jobs_limit, job_hang_limit,
>>>                        msecs_to_jiffies(hang_limit_ms),
>>> -                     "v3d_csd");
>>> +                     "v3d_csd",
>>> +                     &v3d->drm);
>>>           if (ret) {
>>>               dev_err(v3d->drm.dev, "Failed to create CSD scheduler: 
>>> %d.",
>>>                   ret);
>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                        &v3d_cache_clean_sched_ops,
>>>                        hw_jobs_limit, job_hang_limit,
>>>                        msecs_to_jiffies(hang_limit_ms),
>>> -                     "v3d_cache_clean");
>>> +                     "v3d_cache_clean",
>>> +                     &v3d->drm);
>>>           if (ret) {
>>>               dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN 
>>> scheduler: %d.",
>>>                   ret);
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index 9243655..a980709 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -32,6 +32,7 @@
>>>     struct drm_gpu_scheduler;
>>>   struct drm_sched_rq;
>>> +struct drm_device;
>>>     /* These are often used as an (initial) index
>>>    * to an array, and as such should start at 0.
>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>    * @score: score to help loadbalancer pick a idle sched
>>>    * @ready: marks if the underlying HW is ready to work
>>>    * @free_guilty: A hit to time out handler to free the guilty job.
>>> + * @ddev: Pointer to drm device of this scheduler.
>>>    *
>>>    * One scheduler is implemented for each hardware ring.
>>>    */
>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>       atomic_t                        score;
>>>       bool                ready;
>>>       bool                free_guilty;
>>> +    struct drm_device        *ddev;
>>>   };
>>>     int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>              const struct drm_sched_backend_ops *ops,
>>>              uint32_t hw_submission, unsigned hang_limit, long timeout,
>>> -           const char *name);
>>> +           const char *name,
>>> +           struct drm_device *ddev);
>>>     void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>   int drm_sched_job_init(struct drm_sched_job *job,
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-23  8:06         ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23  8:06 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>
> On 11/22/20 6:57 AM, Christian König wrote:
>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>> No point to try recovery if device is gone, it's meaningless.
>>
>> I think that this should go into the device specific recovery 
>> function and not in the scheduler.
>
>
> The timeout timer is rearmed here, so this prevents any new recovery 
> work to restart from here
> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not 
> cover other places like
> job cleanup or starting new job but those should stop once the 
> scheduler thread is stopped later.

Yeah, but this is rather unclean. We should probably return an error 
code instead if the timer should be rearmed or not.

Christian.

>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>   drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>   drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>   drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>   drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>   drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>   include/drm/gpu_scheduler.h               |  6 +++++-
>>>   7 files changed, 35 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> index d56f402..d0b0021 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct 
>>> amdgpu_ring *ring,
>>>             r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>                      num_hw_submission, amdgpu_job_hang_limit,
>>> -                   timeout, ring->name);
>>> +                   timeout, ring->name, &adev->ddev);
>>>           if (r) {
>>>               DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>                     ring->name);
>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> index cd46c88..7678287 100644
>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>         ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>                    etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>> +                 gpu->drm);
>>>       if (ret)
>>>           return ret;
>>>   diff --git a/drivers/gpu/drm/lima/lima_sched.c 
>>> b/drivers/gpu/drm/lima/lima_sched.c
>>> index dc6df9e..8a7e5d7ca 100644
>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe 
>>> *pipe, const char *name)
>>>         return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>                     lima_job_hang_limit, msecs_to_jiffies(timeout),
>>> -                  name);
>>> +                  name,
>>> +                  pipe->ldev->ddev);
>>>   }
>>>     void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> index 30e7b71..37b03b01 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device 
>>> *pfdev)
>>>           ret = drm_sched_init(&js->queue[j].sched,
>>>                        &panfrost_sched_ops,
>>>                        1, 0, msecs_to_jiffies(500),
>>> -                     "pan_js");
>>> +                     "pan_js", pfdev->ddev);
>>>           if (ret) {
>>>               dev_err(pfdev->dev, "Failed to create scheduler: %d.", 
>>> ret);
>>>               goto err_sched;
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index c3f0bd0..95db8c6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -53,6 +53,7 @@
>>>   #include <drm/drm_print.h>
>>>   #include <drm/gpu_scheduler.h>
>>>   #include <drm/spsc_queue.h>
>>> +#include <drm/drm_drv.h>
>>>     #define CREATE_TRACE_POINTS
>>>   #include "gpu_scheduler_trace.h"
>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct 
>>> work_struct *work)
>>>       struct drm_gpu_scheduler *sched;
>>>       struct drm_sched_job *job;
>>>   +    int idx;
>>> +
>>>       sched = container_of(work, struct drm_gpu_scheduler, 
>>> work_tdr.work);
>>>   +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>> +        DRM_INFO("%s - device unplugged skipping recovery on 
>>> scheduler:%s",
>>> +             __func__, sched->name);
>>> +        return;
>>> +    }
>>> +
>>>       /* Protects against concurrent deletion in 
>>> drm_sched_get_cleanup_job */
>>>       spin_lock(&sched->job_list_lock);
>>>       job = list_first_entry_or_null(&sched->ring_mirror_list,
>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct 
>>> work_struct *work)
>>>       spin_lock(&sched->job_list_lock);
>>>       drm_sched_start_timeout(sched);
>>>       spin_unlock(&sched->job_list_lock);
>>> +
>>> +    drm_dev_exit(idx);
>>>   }
>>>      /**
>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>              unsigned hw_submission,
>>>              unsigned hang_limit,
>>>              long timeout,
>>> -           const char *name)
>>> +           const char *name,
>>> +           struct drm_device *ddev)
>>>   {
>>>       int i, ret;
>>>       sched->ops = ops;
>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>       sched->name = name;
>>>       sched->timeout = timeout;
>>>       sched->hang_limit = hang_limit;
>>> +    sched->ddev = ddev;
>>>       for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; 
>>> i++)
>>>           drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>   diff --git a/drivers/gpu/drm/v3d/v3d_sched.c 
>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>> index 0747614..f5076e5 100644
>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                    &v3d_bin_sched_ops,
>>>                    hw_jobs_limit, job_hang_limit,
>>>                    msecs_to_jiffies(hang_limit_ms),
>>> -                 "v3d_bin");
>>> +                 "v3d_bin",
>>> +                 &v3d->drm);
>>>       if (ret) {
>>>           dev_err(v3d->drm.dev, "Failed to create bin scheduler: 
>>> %d.", ret);
>>>           return ret;
>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                    &v3d_render_sched_ops,
>>>                    hw_jobs_limit, job_hang_limit,
>>>                    msecs_to_jiffies(hang_limit_ms),
>>> -                 "v3d_render");
>>> +                 "v3d_render",
>>> +                 &v3d->drm);
>>>       if (ret) {
>>>           dev_err(v3d->drm.dev, "Failed to create render scheduler: 
>>> %d.",
>>>               ret);
>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                    &v3d_tfu_sched_ops,
>>>                    hw_jobs_limit, job_hang_limit,
>>>                    msecs_to_jiffies(hang_limit_ms),
>>> -                 "v3d_tfu");
>>> +                 "v3d_tfu",
>>> +                 &v3d->drm);
>>>       if (ret) {
>>>           dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>               ret);
>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                        &v3d_csd_sched_ops,
>>>                        hw_jobs_limit, job_hang_limit,
>>>                        msecs_to_jiffies(hang_limit_ms),
>>> -                     "v3d_csd");
>>> +                     "v3d_csd",
>>> +                     &v3d->drm);
>>>           if (ret) {
>>>               dev_err(v3d->drm.dev, "Failed to create CSD scheduler: 
>>> %d.",
>>>                   ret);
>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>                        &v3d_cache_clean_sched_ops,
>>>                        hw_jobs_limit, job_hang_limit,
>>>                        msecs_to_jiffies(hang_limit_ms),
>>> -                     "v3d_cache_clean");
>>> +                     "v3d_cache_clean",
>>> +                     &v3d->drm);
>>>           if (ret) {
>>>               dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN 
>>> scheduler: %d.",
>>>                   ret);
>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>> index 9243655..a980709 100644
>>> --- a/include/drm/gpu_scheduler.h
>>> +++ b/include/drm/gpu_scheduler.h
>>> @@ -32,6 +32,7 @@
>>>     struct drm_gpu_scheduler;
>>>   struct drm_sched_rq;
>>> +struct drm_device;
>>>     /* These are often used as an (initial) index
>>>    * to an array, and as such should start at 0.
>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>    * @score: score to help loadbalancer pick a idle sched
>>>    * @ready: marks if the underlying HW is ready to work
>>>    * @free_guilty: A hit to time out handler to free the guilty job.
>>> + * @ddev: Pointer to drm device of this scheduler.
>>>    *
>>>    * One scheduler is implemented for each hardware ring.
>>>    */
>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>       atomic_t                        score;
>>>       bool                ready;
>>>       bool                free_guilty;
>>> +    struct drm_device        *ddev;
>>>   };
>>>     int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>              const struct drm_sched_backend_ops *ops,
>>>              uint32_t hw_submission, unsigned hang_limit, long timeout,
>>> -           const char *name);
>>> +           const char *name,
>>> +           struct drm_device *ddev);
>>>     void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>   int drm_sched_job_init(struct drm_sched_job *job,
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-25 10:42     ` Christian König
@ 2020-11-23 20:05       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23 20:05 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/25/20 5:42 AM, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>> It's needed to drop iommu backed pages on device unplug
>> before device's IOMMU group is released.
>
> It would be cleaner if we could do the whole handling in TTM. I also need to 
> double check what you are doing with this function.
>
> Christian.


Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to see
how i use it. I don't see why this should go into TTM mid-layer - the stuff I do 
inside
is vendor specific and also I don't think TTM is explicitly aware of IOMMU ?
Do you mean you prefer the IOMMU notifier to be registered from within TTM
and then use a hook to call into vendor specific handler ?

Andrey


>
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>> index 1ccf1ef..29248a5 100644
>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>       else
>>           ttm_pool_unpopulate(ttm);
>>   }
>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-23 20:05       ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23 20:05 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland


On 11/25/20 5:42 AM, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>> It's needed to drop iommu backed pages on device unplug
>> before device's IOMMU group is released.
>
> It would be cleaner if we could do the whole handling in TTM. I also need to 
> double check what you are doing with this function.
>
> Christian.


Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to see
how i use it. I don't see why this should go into TTM mid-layer - the stuff I do 
inside
is vendor specific and also I don't think TTM is explicitly aware of IOMMU ?
Do you mean you prefer the IOMMU notifier to be registered from within TTM
and then use a hook to call into vendor specific handler ?

Andrey


>
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>> index 1ccf1ef..29248a5 100644
>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>       else
>>           ttm_pool_unpopulate(ttm);
>>   }
>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-23 20:05       ` Andrey Grodzovsky
@ 2020-11-23 20:20         ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23 20:20 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>
> On 11/25/20 5:42 AM, Christian König wrote:
>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>> It's needed to drop iommu backed pages on device unplug
>>> before device's IOMMU group is released.
>>
>> It would be cleaner if we could do the whole handling in TTM. I also 
>> need to double check what you are doing with this function.
>>
>> Christian.
>
>
> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." 
> to see
> how i use it. I don't see why this should go into TTM mid-layer - the 
> stuff I do inside
> is vendor specific and also I don't think TTM is explicitly aware of 
> IOMMU ?
> Do you mean you prefer the IOMMU notifier to be registered from within 
> TTM
> and then use a hook to call into vendor specific handler ?

No, that is really vendor specific.

What I meant is to have a function like ttm_resource_manager_evict_all() 
which you only need to call and all tt objects are unpopulated.

Give me a day or two to look into this.

Christian.

>
> Andrey
>
>
>>
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c 
>>> b/drivers/gpu/drm/ttm/ttm_tt.c
>>> index 1ccf1ef..29248a5 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>       else
>>>           ttm_pool_unpopulate(ttm);
>>>   }
>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-23 20:20         ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23 20:20 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>
> On 11/25/20 5:42 AM, Christian König wrote:
>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>> It's needed to drop iommu backed pages on device unplug
>>> before device's IOMMU group is released.
>>
>> It would be cleaner if we could do the whole handling in TTM. I also 
>> need to double check what you are doing with this function.
>>
>> Christian.
>
>
> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." 
> to see
> how i use it. I don't see why this should go into TTM mid-layer - the 
> stuff I do inside
> is vendor specific and also I don't think TTM is explicitly aware of 
> IOMMU ?
> Do you mean you prefer the IOMMU notifier to be registered from within 
> TTM
> and then use a hook to call into vendor specific handler ?

No, that is really vendor specific.

What I meant is to have a function like ttm_resource_manager_evict_all() 
which you only need to call and all tt objects are unpopulated.

Give me a day or two to look into this.

Christian.

>
> Andrey
>
>
>>
>>>
>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>> ---
>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c 
>>> b/drivers/gpu/drm/ttm/ttm_tt.c
>>> index 1ccf1ef..29248a5 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>       else
>>>           ttm_pool_unpopulate(ttm);
>>>   }
>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-23 20:20         ` Christian König
@ 2020-11-23 20:38           ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23 20:38 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/23/20 3:20 PM, Christian König wrote:
> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>
>> On 11/25/20 5:42 AM, Christian König wrote:
>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>> It's needed to drop iommu backed pages on device unplug
>>>> before device's IOMMU group is released.
>>>
>>> It would be cleaner if we could do the whole handling in TTM. I also need to 
>>> double check what you are doing with this function.
>>>
>>> Christian.
>>
>>
>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to see
>> how i use it. I don't see why this should go into TTM mid-layer - the stuff I 
>> do inside
>> is vendor specific and also I don't think TTM is explicitly aware of IOMMU ?
>> Do you mean you prefer the IOMMU notifier to be registered from within TTM
>> and then use a hook to call into vendor specific handler ?
>
> No, that is really vendor specific.
>
> What I meant is to have a function like ttm_resource_manager_evict_all() which 
> you only need to call and all tt objects are unpopulated.


So instead of this BO list i create and later iterate in amdgpu from the IOMMU 
patch you just want to do it within
TTM with a single function ? Makes much more sense.

Andrey


>
> Give me a day or two to look into this.
>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>> index 1ccf1ef..29248a5 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>       else
>>>>           ttm_pool_unpopulate(ttm);
>>>>   }
>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-23 20:38           ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23 20:38 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland


On 11/23/20 3:20 PM, Christian König wrote:
> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>
>> On 11/25/20 5:42 AM, Christian König wrote:
>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>> It's needed to drop iommu backed pages on device unplug
>>>> before device's IOMMU group is released.
>>>
>>> It would be cleaner if we could do the whole handling in TTM. I also need to 
>>> double check what you are doing with this function.
>>>
>>> Christian.
>>
>>
>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to see
>> how i use it. I don't see why this should go into TTM mid-layer - the stuff I 
>> do inside
>> is vendor specific and also I don't think TTM is explicitly aware of IOMMU ?
>> Do you mean you prefer the IOMMU notifier to be registered from within TTM
>> and then use a hook to call into vendor specific handler ?
>
> No, that is really vendor specific.
>
> What I meant is to have a function like ttm_resource_manager_evict_all() which 
> you only need to call and all tt objects are unpopulated.


So instead of this BO list i create and later iterate in amdgpu from the IOMMU 
patch you just want to do it within
TTM with a single function ? Makes much more sense.

Andrey


>
> Give me a day or two to look into this.
>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>   1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>> index 1ccf1ef..29248a5 100644
>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>       else
>>>>           ttm_pool_unpopulate(ttm);
>>>>   }
>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-23 20:38           ` Andrey Grodzovsky
@ 2020-11-23 20:41             ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23 20:41 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>
> On 11/23/20 3:20 PM, Christian König wrote:
>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>
>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>> It's needed to drop iommu backed pages on device unplug
>>>>> before device's IOMMU group is released.
>>>>
>>>> It would be cleaner if we could do the whole handling in TTM. I 
>>>> also need to double check what you are doing with this function.
>>>>
>>>> Christian.
>>>
>>>
>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per 
>>> device." to see
>>> how i use it. I don't see why this should go into TTM mid-layer - 
>>> the stuff I do inside
>>> is vendor specific and also I don't think TTM is explicitly aware of 
>>> IOMMU ?
>>> Do you mean you prefer the IOMMU notifier to be registered from 
>>> within TTM
>>> and then use a hook to call into vendor specific handler ?
>>
>> No, that is really vendor specific.
>>
>> What I meant is to have a function like 
>> ttm_resource_manager_evict_all() which you only need to call and all 
>> tt objects are unpopulated.
>
>
> So instead of this BO list i create and later iterate in amdgpu from 
> the IOMMU patch you just want to do it within
> TTM with a single function ? Makes much more sense.

Yes, exactly.

The list_empty() checks we have in TTM for the LRU are actually not the 
best idea, we should now check the pin_count instead. This way we could 
also have a list of the pinned BOs in TTM.

BTW: Have you thought about what happens when we unpopulate a BO while 
we still try to use a kernel mapping for it? That could have unforeseen 
consequences.

Christian.

>
> Andrey
>
>
>>
>> Give me a day or two to look into this.
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>>   1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> index 1ccf1ef..29248a5 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>>       else
>>>>>           ttm_pool_unpopulate(ttm);
>>>>>   }
>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>>
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-23 20:41             ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-23 20:41 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>
> On 11/23/20 3:20 PM, Christian König wrote:
>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>
>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>> It's needed to drop iommu backed pages on device unplug
>>>>> before device's IOMMU group is released.
>>>>
>>>> It would be cleaner if we could do the whole handling in TTM. I 
>>>> also need to double check what you are doing with this function.
>>>>
>>>> Christian.
>>>
>>>
>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per 
>>> device." to see
>>> how i use it. I don't see why this should go into TTM mid-layer - 
>>> the stuff I do inside
>>> is vendor specific and also I don't think TTM is explicitly aware of 
>>> IOMMU ?
>>> Do you mean you prefer the IOMMU notifier to be registered from 
>>> within TTM
>>> and then use a hook to call into vendor specific handler ?
>>
>> No, that is really vendor specific.
>>
>> What I meant is to have a function like 
>> ttm_resource_manager_evict_all() which you only need to call and all 
>> tt objects are unpopulated.
>
>
> So instead of this BO list i create and later iterate in amdgpu from 
> the IOMMU patch you just want to do it within
> TTM with a single function ? Makes much more sense.

Yes, exactly.

The list_empty() checks we have in TTM for the LRU are actually not the 
best idea, we should now check the pin_count instead. This way we could 
also have a list of the pinned BOs in TTM.

BTW: Have you thought about what happens when we unpopulate a BO while 
we still try to use a kernel mapping for it? That could have unforeseen 
consequences.

Christian.

>
> Andrey
>
>
>>
>> Give me a day or two to look into this.
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>>   1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c 
>>>>> b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> index 1ccf1ef..29248a5 100644
>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>>       else
>>>>>           ttm_pool_unpopulate(ttm);
>>>>>   }
>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-23 20:41             ` Christian König
@ 2020-11-23 21:08               ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23 21:08 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/23/20 3:41 PM, Christian König wrote:
> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>
>> On 11/23/20 3:20 PM, Christian König wrote:
>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>
>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>> before device's IOMMU group is released.
>>>>>
>>>>> It would be cleaner if we could do the whole handling in TTM. I also need 
>>>>> to double check what you are doing with this function.
>>>>>
>>>>> Christian.
>>>>
>>>>
>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to see
>>>> how i use it. I don't see why this should go into TTM mid-layer - the stuff 
>>>> I do inside
>>>> is vendor specific and also I don't think TTM is explicitly aware of IOMMU ?
>>>> Do you mean you prefer the IOMMU notifier to be registered from within TTM
>>>> and then use a hook to call into vendor specific handler ?
>>>
>>> No, that is really vendor specific.
>>>
>>> What I meant is to have a function like ttm_resource_manager_evict_all() 
>>> which you only need to call and all tt objects are unpopulated.
>>
>>
>> So instead of this BO list i create and later iterate in amdgpu from the 
>> IOMMU patch you just want to do it within
>> TTM with a single function ? Makes much more sense.
>
> Yes, exactly.
>
> The list_empty() checks we have in TTM for the LRU are actually not the best 
> idea, we should now check the pin_count instead. This way we could also have a 
> list of the pinned BOs in TTM.


So from my IOMMU topology handler I will iterate the TTM LRU for the unpinned 
BOs and this new function for the pinned ones  ?
It's probably a good idea to combine both iterations into this new function to 
cover all the BOs allocated on the device.


>
> BTW: Have you thought about what happens when we unpopulate a BO while we 
> still try to use a kernel mapping for it? That could have unforeseen 
> consequences.


Are you asking what happens to kmap or vmap style mapped CPU accesses once we 
drop all the DMA backing pages for a particular BO ? Because for user mappings
(mmap) we took care of this with dummy page reroute but indeed nothing was done 
for in kernel CPU mappings.

Andrey


>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>> Give me a day or two to look into this.
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>>>   1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> index 1ccf1ef..29248a5 100644
>>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>>>       else
>>>>>>           ttm_pool_unpopulate(ttm);
>>>>>>   }
>>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>>>
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-23 21:08               ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-23 21:08 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland


On 11/23/20 3:41 PM, Christian König wrote:
> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>
>> On 11/23/20 3:20 PM, Christian König wrote:
>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>
>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>> before device's IOMMU group is released.
>>>>>
>>>>> It would be cleaner if we could do the whole handling in TTM. I also need 
>>>>> to double check what you are doing with this function.
>>>>>
>>>>> Christian.
>>>>
>>>>
>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to see
>>>> how i use it. I don't see why this should go into TTM mid-layer - the stuff 
>>>> I do inside
>>>> is vendor specific and also I don't think TTM is explicitly aware of IOMMU ?
>>>> Do you mean you prefer the IOMMU notifier to be registered from within TTM
>>>> and then use a hook to call into vendor specific handler ?
>>>
>>> No, that is really vendor specific.
>>>
>>> What I meant is to have a function like ttm_resource_manager_evict_all() 
>>> which you only need to call and all tt objects are unpopulated.
>>
>>
>> So instead of this BO list i create and later iterate in amdgpu from the 
>> IOMMU patch you just want to do it within
>> TTM with a single function ? Makes much more sense.
>
> Yes, exactly.
>
> The list_empty() checks we have in TTM for the LRU are actually not the best 
> idea, we should now check the pin_count instead. This way we could also have a 
> list of the pinned BOs in TTM.


So from my IOMMU topology handler I will iterate the TTM LRU for the unpinned 
BOs and this new function for the pinned ones  ?
It's probably a good idea to combine both iterations into this new function to 
cover all the BOs allocated on the device.


>
> BTW: Have you thought about what happens when we unpopulate a BO while we 
> still try to use a kernel mapping for it? That could have unforeseen 
> consequences.


Are you asking what happens to kmap or vmap style mapped CPU accesses once we 
drop all the DMA backing pages for a particular BO ? Because for user mappings
(mmap) we took care of this with dummy page reroute but indeed nothing was done 
for in kernel CPU mappings.

Andrey


>
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>> Give me a day or two to look into this.
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>>>   1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> index 1ccf1ef..29248a5 100644
>>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>>>       else
>>>>>>           ttm_pool_unpopulate(ttm);
>>>>>>   }
>>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-23  8:06         ` Christian König
@ 2020-11-24  1:12           ` Luben Tuikov
  -1 siblings, 0 replies; 212+ messages in thread
From: Luben Tuikov @ 2020-11-24  1:12 UTC (permalink / raw)
  To: christian.koenig, Andrey Grodzovsky, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

On 2020-11-23 3:06 a.m., Christian König wrote:
> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>
>> On 11/22/20 6:57 AM, Christian König wrote:
>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>> No point to try recovery if device is gone, it's meaningless.
>>>
>>> I think that this should go into the device specific recovery 
>>> function and not in the scheduler.
>>
>>
>> The timeout timer is rearmed here, so this prevents any new recovery 
>> work to restart from here
>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not 
>> cover other places like
>> job cleanup or starting new job but those should stop once the 
>> scheduler thread is stopped later.
> 
> Yeah, but this is rather unclean. We should probably return an error 
> code instead if the timer should be rearmed or not.

Christian, this is exactly my work I told you about
last week on Wednesday in our weekly meeting. And
which I wrote to you in an email last year about this
time.

So what do we do now?

I can submit those changes without the last part,
which builds on this change.

I'm still testing the last part and was hoping
to submit it all in one sequence of patches,
after my testing.

Regards,
Luben

> 
> Christian.
> 
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>   drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>   drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>   drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>   drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>   drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>   include/drm/gpu_scheduler.h               |  6 +++++-
>>>>   7 files changed, 35 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index d56f402..d0b0021 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct 
>>>> amdgpu_ring *ring,
>>>>             r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>                      num_hw_submission, amdgpu_job_hang_limit,
>>>> -                   timeout, ring->name);
>>>> +                   timeout, ring->name, &adev->ddev);
>>>>           if (r) {
>>>>               DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>                     ring->name);
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> index cd46c88..7678287 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>         ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>                    etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>> +                 gpu->drm);
>>>>       if (ret)
>>>>           return ret;
>>>>   diff --git a/drivers/gpu/drm/lima/lima_sched.c 
>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>> index dc6df9e..8a7e5d7ca 100644
>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe 
>>>> *pipe, const char *name)
>>>>         return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>                     lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>> -                  name);
>>>> +                  name,
>>>> +                  pipe->ldev->ddev);
>>>>   }
>>>>     void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> index 30e7b71..37b03b01 100644
>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device 
>>>> *pfdev)
>>>>           ret = drm_sched_init(&js->queue[j].sched,
>>>>                        &panfrost_sched_ops,
>>>>                        1, 0, msecs_to_jiffies(500),
>>>> -                     "pan_js");
>>>> +                     "pan_js", pfdev->ddev);
>>>>           if (ret) {
>>>>               dev_err(pfdev->dev, "Failed to create scheduler: %d.", 
>>>> ret);
>>>>               goto err_sched;
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index c3f0bd0..95db8c6 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -53,6 +53,7 @@
>>>>   #include <drm/drm_print.h>
>>>>   #include <drm/gpu_scheduler.h>
>>>>   #include <drm/spsc_queue.h>
>>>> +#include <drm/drm_drv.h>
>>>>     #define CREATE_TRACE_POINTS
>>>>   #include "gpu_scheduler_trace.h"
>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct 
>>>> work_struct *work)
>>>>       struct drm_gpu_scheduler *sched;
>>>>       struct drm_sched_job *job;
>>>>   +    int idx;
>>>> +
>>>>       sched = container_of(work, struct drm_gpu_scheduler, 
>>>> work_tdr.work);
>>>>   +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>> +        DRM_INFO("%s - device unplugged skipping recovery on 
>>>> scheduler:%s",
>>>> +             __func__, sched->name);
>>>> +        return;
>>>> +    }
>>>> +
>>>>       /* Protects against concurrent deletion in 
>>>> drm_sched_get_cleanup_job */
>>>>       spin_lock(&sched->job_list_lock);
>>>>       job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct 
>>>> work_struct *work)
>>>>       spin_lock(&sched->job_list_lock);
>>>>       drm_sched_start_timeout(sched);
>>>>       spin_unlock(&sched->job_list_lock);
>>>> +
>>>> +    drm_dev_exit(idx);
>>>>   }
>>>>      /**
>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>              unsigned hw_submission,
>>>>              unsigned hang_limit,
>>>>              long timeout,
>>>> -           const char *name)
>>>> +           const char *name,
>>>> +           struct drm_device *ddev)
>>>>   {
>>>>       int i, ret;
>>>>       sched->ops = ops;
>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>       sched->name = name;
>>>>       sched->timeout = timeout;
>>>>       sched->hang_limit = hang_limit;
>>>> +    sched->ddev = ddev;
>>>>       for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; 
>>>> i++)
>>>>           drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>   diff --git a/drivers/gpu/drm/v3d/v3d_sched.c 
>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>> index 0747614..f5076e5 100644
>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                    &v3d_bin_sched_ops,
>>>>                    hw_jobs_limit, job_hang_limit,
>>>>                    msecs_to_jiffies(hang_limit_ms),
>>>> -                 "v3d_bin");
>>>> +                 "v3d_bin",
>>>> +                 &v3d->drm);
>>>>       if (ret) {
>>>>           dev_err(v3d->drm.dev, "Failed to create bin scheduler: 
>>>> %d.", ret);
>>>>           return ret;
>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                    &v3d_render_sched_ops,
>>>>                    hw_jobs_limit, job_hang_limit,
>>>>                    msecs_to_jiffies(hang_limit_ms),
>>>> -                 "v3d_render");
>>>> +                 "v3d_render",
>>>> +                 &v3d->drm);
>>>>       if (ret) {
>>>>           dev_err(v3d->drm.dev, "Failed to create render scheduler: 
>>>> %d.",
>>>>               ret);
>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                    &v3d_tfu_sched_ops,
>>>>                    hw_jobs_limit, job_hang_limit,
>>>>                    msecs_to_jiffies(hang_limit_ms),
>>>> -                 "v3d_tfu");
>>>> +                 "v3d_tfu",
>>>> +                 &v3d->drm);
>>>>       if (ret) {
>>>>           dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>               ret);
>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                        &v3d_csd_sched_ops,
>>>>                        hw_jobs_limit, job_hang_limit,
>>>>                        msecs_to_jiffies(hang_limit_ms),
>>>> -                     "v3d_csd");
>>>> +                     "v3d_csd",
>>>> +                     &v3d->drm);
>>>>           if (ret) {
>>>>               dev_err(v3d->drm.dev, "Failed to create CSD scheduler: 
>>>> %d.",
>>>>                   ret);
>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                        &v3d_cache_clean_sched_ops,
>>>>                        hw_jobs_limit, job_hang_limit,
>>>>                        msecs_to_jiffies(hang_limit_ms),
>>>> -                     "v3d_cache_clean");
>>>> +                     "v3d_cache_clean",
>>>> +                     &v3d->drm);
>>>>           if (ret) {
>>>>               dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN 
>>>> scheduler: %d.",
>>>>                   ret);
>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>> index 9243655..a980709 100644
>>>> --- a/include/drm/gpu_scheduler.h
>>>> +++ b/include/drm/gpu_scheduler.h
>>>> @@ -32,6 +32,7 @@
>>>>     struct drm_gpu_scheduler;
>>>>   struct drm_sched_rq;
>>>> +struct drm_device;
>>>>     /* These are often used as an (initial) index
>>>>    * to an array, and as such should start at 0.
>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>    * @score: score to help loadbalancer pick a idle sched
>>>>    * @ready: marks if the underlying HW is ready to work
>>>>    * @free_guilty: A hit to time out handler to free the guilty job.
>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>    *
>>>>    * One scheduler is implemented for each hardware ring.
>>>>    */
>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>       atomic_t                        score;
>>>>       bool                ready;
>>>>       bool                free_guilty;
>>>> +    struct drm_device        *ddev;
>>>>   };
>>>>     int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>              const struct drm_sched_backend_ops *ops,
>>>>              uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>> -           const char *name);
>>>> +           const char *name,
>>>> +           struct drm_device *ddev);
>>>>     void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>   int drm_sched_job_init(struct drm_sched_job *job,
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7206e081871546cde52408d88f86a3c3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417155725505874%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NejMsBm%2Fk9gheoQJv29vIe9f59jelk12ViF9%2Bt2UUWU%3D&amp;reserved=0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7206e081871546cde52408d88f86a3c3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417155725515872%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WtDcIF22HvCMJHObfEhLD%2F7%2BZ37%2FxQC1465YoOrMEjc%3D&amp;reserved=0
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-24  1:12           ` Luben Tuikov
  0 siblings, 0 replies; 212+ messages in thread
From: Luben Tuikov @ 2020-11-24  1:12 UTC (permalink / raw)
  To: christian.koenig, Andrey Grodzovsky, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

On 2020-11-23 3:06 a.m., Christian König wrote:
> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>
>> On 11/22/20 6:57 AM, Christian König wrote:
>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>> No point to try recovery if device is gone, it's meaningless.
>>>
>>> I think that this should go into the device specific recovery 
>>> function and not in the scheduler.
>>
>>
>> The timeout timer is rearmed here, so this prevents any new recovery 
>> work to restart from here
>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not 
>> cover other places like
>> job cleanup or starting new job but those should stop once the 
>> scheduler thread is stopped later.
> 
> Yeah, but this is rather unclean. We should probably return an error 
> code instead if the timer should be rearmed or not.

Christian, this is exactly my work I told you about
last week on Wednesday in our weekly meeting. And
which I wrote to you in an email last year about this
time.

So what do we do now?

I can submit those changes without the last part,
which builds on this change.

I'm still testing the last part and was hoping
to submit it all in one sequence of patches,
after my testing.

Regards,
Luben

> 
> Christian.
> 
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>   drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>   drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>   drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>   drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>   drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>   include/drm/gpu_scheduler.h               |  6 +++++-
>>>>   7 files changed, 35 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index d56f402..d0b0021 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct 
>>>> amdgpu_ring *ring,
>>>>             r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>                      num_hw_submission, amdgpu_job_hang_limit,
>>>> -                   timeout, ring->name);
>>>> +                   timeout, ring->name, &adev->ddev);
>>>>           if (r) {
>>>>               DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>                     ring->name);
>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c 
>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> index cd46c88..7678287 100644
>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>         ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>                    etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>> +                 gpu->drm);
>>>>       if (ret)
>>>>           return ret;
>>>>   diff --git a/drivers/gpu/drm/lima/lima_sched.c 
>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>> index dc6df9e..8a7e5d7ca 100644
>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe 
>>>> *pipe, const char *name)
>>>>         return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>                     lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>> -                  name);
>>>> +                  name,
>>>> +                  pipe->ldev->ddev);
>>>>   }
>>>>     void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> index 30e7b71..37b03b01 100644
>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device 
>>>> *pfdev)
>>>>           ret = drm_sched_init(&js->queue[j].sched,
>>>>                        &panfrost_sched_ops,
>>>>                        1, 0, msecs_to_jiffies(500),
>>>> -                     "pan_js");
>>>> +                     "pan_js", pfdev->ddev);
>>>>           if (ret) {
>>>>               dev_err(pfdev->dev, "Failed to create scheduler: %d.", 
>>>> ret);
>>>>               goto err_sched;
>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>> index c3f0bd0..95db8c6 100644
>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>> @@ -53,6 +53,7 @@
>>>>   #include <drm/drm_print.h>
>>>>   #include <drm/gpu_scheduler.h>
>>>>   #include <drm/spsc_queue.h>
>>>> +#include <drm/drm_drv.h>
>>>>     #define CREATE_TRACE_POINTS
>>>>   #include "gpu_scheduler_trace.h"
>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct 
>>>> work_struct *work)
>>>>       struct drm_gpu_scheduler *sched;
>>>>       struct drm_sched_job *job;
>>>>   +    int idx;
>>>> +
>>>>       sched = container_of(work, struct drm_gpu_scheduler, 
>>>> work_tdr.work);
>>>>   +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>> +        DRM_INFO("%s - device unplugged skipping recovery on 
>>>> scheduler:%s",
>>>> +             __func__, sched->name);
>>>> +        return;
>>>> +    }
>>>> +
>>>>       /* Protects against concurrent deletion in 
>>>> drm_sched_get_cleanup_job */
>>>>       spin_lock(&sched->job_list_lock);
>>>>       job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct 
>>>> work_struct *work)
>>>>       spin_lock(&sched->job_list_lock);
>>>>       drm_sched_start_timeout(sched);
>>>>       spin_unlock(&sched->job_list_lock);
>>>> +
>>>> +    drm_dev_exit(idx);
>>>>   }
>>>>      /**
>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>              unsigned hw_submission,
>>>>              unsigned hang_limit,
>>>>              long timeout,
>>>> -           const char *name)
>>>> +           const char *name,
>>>> +           struct drm_device *ddev)
>>>>   {
>>>>       int i, ret;
>>>>       sched->ops = ops;
>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>       sched->name = name;
>>>>       sched->timeout = timeout;
>>>>       sched->hang_limit = hang_limit;
>>>> +    sched->ddev = ddev;
>>>>       for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT; 
>>>> i++)
>>>>           drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>   diff --git a/drivers/gpu/drm/v3d/v3d_sched.c 
>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>> index 0747614..f5076e5 100644
>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                    &v3d_bin_sched_ops,
>>>>                    hw_jobs_limit, job_hang_limit,
>>>>                    msecs_to_jiffies(hang_limit_ms),
>>>> -                 "v3d_bin");
>>>> +                 "v3d_bin",
>>>> +                 &v3d->drm);
>>>>       if (ret) {
>>>>           dev_err(v3d->drm.dev, "Failed to create bin scheduler: 
>>>> %d.", ret);
>>>>           return ret;
>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                    &v3d_render_sched_ops,
>>>>                    hw_jobs_limit, job_hang_limit,
>>>>                    msecs_to_jiffies(hang_limit_ms),
>>>> -                 "v3d_render");
>>>> +                 "v3d_render",
>>>> +                 &v3d->drm);
>>>>       if (ret) {
>>>>           dev_err(v3d->drm.dev, "Failed to create render scheduler: 
>>>> %d.",
>>>>               ret);
>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                    &v3d_tfu_sched_ops,
>>>>                    hw_jobs_limit, job_hang_limit,
>>>>                    msecs_to_jiffies(hang_limit_ms),
>>>> -                 "v3d_tfu");
>>>> +                 "v3d_tfu",
>>>> +                 &v3d->drm);
>>>>       if (ret) {
>>>>           dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>               ret);
>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                        &v3d_csd_sched_ops,
>>>>                        hw_jobs_limit, job_hang_limit,
>>>>                        msecs_to_jiffies(hang_limit_ms),
>>>> -                     "v3d_csd");
>>>> +                     "v3d_csd",
>>>> +                     &v3d->drm);
>>>>           if (ret) {
>>>>               dev_err(v3d->drm.dev, "Failed to create CSD scheduler: 
>>>> %d.",
>>>>                   ret);
>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>                        &v3d_cache_clean_sched_ops,
>>>>                        hw_jobs_limit, job_hang_limit,
>>>>                        msecs_to_jiffies(hang_limit_ms),
>>>> -                     "v3d_cache_clean");
>>>> +                     "v3d_cache_clean",
>>>> +                     &v3d->drm);
>>>>           if (ret) {
>>>>               dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN 
>>>> scheduler: %d.",
>>>>                   ret);
>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>> index 9243655..a980709 100644
>>>> --- a/include/drm/gpu_scheduler.h
>>>> +++ b/include/drm/gpu_scheduler.h
>>>> @@ -32,6 +32,7 @@
>>>>     struct drm_gpu_scheduler;
>>>>   struct drm_sched_rq;
>>>> +struct drm_device;
>>>>     /* These are often used as an (initial) index
>>>>    * to an array, and as such should start at 0.
>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>    * @score: score to help loadbalancer pick a idle sched
>>>>    * @ready: marks if the underlying HW is ready to work
>>>>    * @free_guilty: A hit to time out handler to free the guilty job.
>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>    *
>>>>    * One scheduler is implemented for each hardware ring.
>>>>    */
>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>       atomic_t                        score;
>>>>       bool                ready;
>>>>       bool                free_guilty;
>>>> +    struct drm_device        *ddev;
>>>>   };
>>>>     int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>              const struct drm_sched_backend_ops *ops,
>>>>              uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>> -           const char *name);
>>>> +           const char *name,
>>>> +           struct drm_device *ddev);
>>>>     void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>   int drm_sched_job_init(struct drm_sched_job *job,
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7206e081871546cde52408d88f86a3c3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417155725505874%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NejMsBm%2Fk9gheoQJv29vIe9f59jelk12ViF9%2Bt2UUWU%3D&amp;reserved=0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7206e081871546cde52408d88f86a3c3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417155725515872%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WtDcIF22HvCMJHObfEhLD%2F7%2BZ37%2FxQC1465YoOrMEjc%3D&amp;reserved=0
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-23 21:08               ` Andrey Grodzovsky
@ 2020-11-24  7:41                 ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-24  7:41 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>
> On 11/23/20 3:41 PM, Christian König wrote:
>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>
>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>> before device's IOMMU group is released.
>>>>>>
>>>>>> It would be cleaner if we could do the whole handling in TTM. I 
>>>>>> also need to double check what you are doing with this function.
>>>>>>
>>>>>> Christian.
>>>>>
>>>>>
>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per 
>>>>> device." to see
>>>>> how i use it. I don't see why this should go into TTM mid-layer - 
>>>>> the stuff I do inside
>>>>> is vendor specific and also I don't think TTM is explicitly aware 
>>>>> of IOMMU ?
>>>>> Do you mean you prefer the IOMMU notifier to be registered from 
>>>>> within TTM
>>>>> and then use a hook to call into vendor specific handler ?
>>>>
>>>> No, that is really vendor specific.
>>>>
>>>> What I meant is to have a function like 
>>>> ttm_resource_manager_evict_all() which you only need to call and 
>>>> all tt objects are unpopulated.
>>>
>>>
>>> So instead of this BO list i create and later iterate in amdgpu from 
>>> the IOMMU patch you just want to do it within
>>> TTM with a single function ? Makes much more sense.
>>
>> Yes, exactly.
>>
>> The list_empty() checks we have in TTM for the LRU are actually not 
>> the best idea, we should now check the pin_count instead. This way we 
>> could also have a list of the pinned BOs in TTM.
>
>
> So from my IOMMU topology handler I will iterate the TTM LRU for the 
> unpinned BOs and this new function for the pinned ones  ?
> It's probably a good idea to combine both iterations into this new 
> function to cover all the BOs allocated on the device.

Yes, that's what I had in my mind as well.

>
>
>>
>> BTW: Have you thought about what happens when we unpopulate a BO 
>> while we still try to use a kernel mapping for it? That could have 
>> unforeseen consequences.
>
>
> Are you asking what happens to kmap or vmap style mapped CPU accesses 
> once we drop all the DMA backing pages for a particular BO ? Because 
> for user mappings
> (mmap) we took care of this with dummy page reroute but indeed nothing 
> was done for in kernel CPU mappings.

Yes exactly that.

In other words what happens if we free the ring buffer while the kernel 
still writes to it?

Christian.

>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> Give me a day or two to look into this.
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>>>>   1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c 
>>>>>>> b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>> index 1ccf1ef..29248a5 100644
>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>>>>       else
>>>>>>>           ttm_pool_unpopulate(ttm);
>>>>>>>   }
>>>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>>>>
>>>>
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-24  7:41                 ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-24  7:41 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>
> On 11/23/20 3:41 PM, Christian König wrote:
>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>
>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>> before device's IOMMU group is released.
>>>>>>
>>>>>> It would be cleaner if we could do the whole handling in TTM. I 
>>>>>> also need to double check what you are doing with this function.
>>>>>>
>>>>>> Christian.
>>>>>
>>>>>
>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per 
>>>>> device." to see
>>>>> how i use it. I don't see why this should go into TTM mid-layer - 
>>>>> the stuff I do inside
>>>>> is vendor specific and also I don't think TTM is explicitly aware 
>>>>> of IOMMU ?
>>>>> Do you mean you prefer the IOMMU notifier to be registered from 
>>>>> within TTM
>>>>> and then use a hook to call into vendor specific handler ?
>>>>
>>>> No, that is really vendor specific.
>>>>
>>>> What I meant is to have a function like 
>>>> ttm_resource_manager_evict_all() which you only need to call and 
>>>> all tt objects are unpopulated.
>>>
>>>
>>> So instead of this BO list i create and later iterate in amdgpu from 
>>> the IOMMU patch you just want to do it within
>>> TTM with a single function ? Makes much more sense.
>>
>> Yes, exactly.
>>
>> The list_empty() checks we have in TTM for the LRU are actually not 
>> the best idea, we should now check the pin_count instead. This way we 
>> could also have a list of the pinned BOs in TTM.
>
>
> So from my IOMMU topology handler I will iterate the TTM LRU for the 
> unpinned BOs and this new function for the pinned ones  ?
> It's probably a good idea to combine both iterations into this new 
> function to cover all the BOs allocated on the device.

Yes, that's what I had in my mind as well.

>
>
>>
>> BTW: Have you thought about what happens when we unpopulate a BO 
>> while we still try to use a kernel mapping for it? That could have 
>> unforeseen consequences.
>
>
> Are you asking what happens to kmap or vmap style mapped CPU accesses 
> once we drop all the DMA backing pages for a particular BO ? Because 
> for user mappings
> (mmap) we took care of this with dummy page reroute but indeed nothing 
> was done for in kernel CPU mappings.

Yes exactly that.

In other words what happens if we free the ring buffer while the kernel 
still writes to it?

Christian.

>
> Andrey
>
>
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>>
>>>>
>>>> Give me a day or two to look into this.
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>>>>   1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c 
>>>>>>> b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>> index 1ccf1ef..29248a5 100644
>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>>>>       else
>>>>>>>           ttm_pool_unpopulate(ttm);
>>>>>>>   }
>>>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>>>>
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-24  1:12           ` Luben Tuikov
@ 2020-11-24  7:50             ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-24  7:50 UTC (permalink / raw)
  To: Luben Tuikov, christian.koenig, Andrey Grodzovsky, amd-gfx,
	dri-devel, daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 24.11.20 um 02:12 schrieb Luben Tuikov:
> On 2020-11-23 3:06 a.m., Christian König wrote:
>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>> No point to try recovery if device is gone, it's meaningless.
>>>> I think that this should go into the device specific recovery
>>>> function and not in the scheduler.
>>>
>>> The timeout timer is rearmed here, so this prevents any new recovery
>>> work to restart from here
>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>> cover other places like
>>> job cleanup or starting new job but those should stop once the
>>> scheduler thread is stopped later.
>> Yeah, but this is rather unclean. We should probably return an error
>> code instead if the timer should be rearmed or not.
> Christian, this is exactly my work I told you about
> last week on Wednesday in our weekly meeting. And
> which I wrote to you in an email last year about this
> time.

Yeah, that's why I'm suggesting it here as well.

> So what do we do now?

Split your patches into smaller parts and submit them chunk by chunk.

E.g. renames first and then functional changes grouped by area they change.

Regards,
Christian.

>
> I can submit those changes without the last part,
> which builds on this change.
>
> I'm still testing the last part and was hoping
> to submit it all in one sequence of patches,
> after my testing.
>
> Regards,
> Luben
>
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>> Christian.
>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>    drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>    drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>    drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>    drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>    include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>    7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> index d56f402..d0b0021 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>> amdgpu_ring *ring,
>>>>>              r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>                       num_hw_submission, amdgpu_job_hang_limit,
>>>>> -                   timeout, ring->name);
>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>            if (r) {
>>>>>                DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>                      ring->name);
>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> index cd46c88..7678287 100644
>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>          ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>                     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>> +                 gpu->drm);
>>>>>        if (ret)
>>>>>            return ret;
>>>>>    diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>> *pipe, const char *name)
>>>>>          return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>                      lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>> -                  name);
>>>>> +                  name,
>>>>> +                  pipe->ldev->ddev);
>>>>>    }
>>>>>      void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> index 30e7b71..37b03b01 100644
>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>> *pfdev)
>>>>>            ret = drm_sched_init(&js->queue[j].sched,
>>>>>                         &panfrost_sched_ops,
>>>>>                         1, 0, msecs_to_jiffies(500),
>>>>> -                     "pan_js");
>>>>> +                     "pan_js", pfdev->ddev);
>>>>>            if (ret) {
>>>>>                dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>> ret);
>>>>>                goto err_sched;
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index c3f0bd0..95db8c6 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -53,6 +53,7 @@
>>>>>    #include <drm/drm_print.h>
>>>>>    #include <drm/gpu_scheduler.h>
>>>>>    #include <drm/spsc_queue.h>
>>>>> +#include <drm/drm_drv.h>
>>>>>      #define CREATE_TRACE_POINTS
>>>>>    #include "gpu_scheduler_trace.h"
>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>> work_struct *work)
>>>>>        struct drm_gpu_scheduler *sched;
>>>>>        struct drm_sched_job *job;
>>>>>    +    int idx;
>>>>> +
>>>>>        sched = container_of(work, struct drm_gpu_scheduler,
>>>>> work_tdr.work);
>>>>>    +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>> scheduler:%s",
>>>>> +             __func__, sched->name);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>        /* Protects against concurrent deletion in
>>>>> drm_sched_get_cleanup_job */
>>>>>        spin_lock(&sched->job_list_lock);
>>>>>        job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>> work_struct *work)
>>>>>        spin_lock(&sched->job_list_lock);
>>>>>        drm_sched_start_timeout(sched);
>>>>>        spin_unlock(&sched->job_list_lock);
>>>>> +
>>>>> +    drm_dev_exit(idx);
>>>>>    }
>>>>>       /**
>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>               unsigned hw_submission,
>>>>>               unsigned hang_limit,
>>>>>               long timeout,
>>>>> -           const char *name)
>>>>> +           const char *name,
>>>>> +           struct drm_device *ddev)
>>>>>    {
>>>>>        int i, ret;
>>>>>        sched->ops = ops;
>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>        sched->name = name;
>>>>>        sched->timeout = timeout;
>>>>>        sched->hang_limit = hang_limit;
>>>>> +    sched->ddev = ddev;
>>>>>        for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>> i++)
>>>>>            drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>    diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> index 0747614..f5076e5 100644
>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                     &v3d_bin_sched_ops,
>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>> -                 "v3d_bin");
>>>>> +                 "v3d_bin",
>>>>> +                 &v3d->drm);
>>>>>        if (ret) {
>>>>>            dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>> %d.", ret);
>>>>>            return ret;
>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                     &v3d_render_sched_ops,
>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>> -                 "v3d_render");
>>>>> +                 "v3d_render",
>>>>> +                 &v3d->drm);
>>>>>        if (ret) {
>>>>>            dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>> %d.",
>>>>>                ret);
>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                     &v3d_tfu_sched_ops,
>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>> -                 "v3d_tfu");
>>>>> +                 "v3d_tfu",
>>>>> +                 &v3d->drm);
>>>>>        if (ret) {
>>>>>            dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>                ret);
>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                         &v3d_csd_sched_ops,
>>>>>                         hw_jobs_limit, job_hang_limit,
>>>>>                         msecs_to_jiffies(hang_limit_ms),
>>>>> -                     "v3d_csd");
>>>>> +                     "v3d_csd",
>>>>> +                     &v3d->drm);
>>>>>            if (ret) {
>>>>>                dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>> %d.",
>>>>>                    ret);
>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                         &v3d_cache_clean_sched_ops,
>>>>>                         hw_jobs_limit, job_hang_limit,
>>>>>                         msecs_to_jiffies(hang_limit_ms),
>>>>> -                     "v3d_cache_clean");
>>>>> +                     "v3d_cache_clean",
>>>>> +                     &v3d->drm);
>>>>>            if (ret) {
>>>>>                dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>> scheduler: %d.",
>>>>>                    ret);
>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>> index 9243655..a980709 100644
>>>>> --- a/include/drm/gpu_scheduler.h
>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>> @@ -32,6 +32,7 @@
>>>>>      struct drm_gpu_scheduler;
>>>>>    struct drm_sched_rq;
>>>>> +struct drm_device;
>>>>>      /* These are often used as an (initial) index
>>>>>     * to an array, and as such should start at 0.
>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>     * @score: score to help loadbalancer pick a idle sched
>>>>>     * @ready: marks if the underlying HW is ready to work
>>>>>     * @free_guilty: A hit to time out handler to free the guilty job.
>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>     *
>>>>>     * One scheduler is implemented for each hardware ring.
>>>>>     */
>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>        atomic_t                        score;
>>>>>        bool                ready;
>>>>>        bool                free_guilty;
>>>>> +    struct drm_device        *ddev;
>>>>>    };
>>>>>      int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>               const struct drm_sched_backend_ops *ops,
>>>>>               uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>> -           const char *name);
>>>>> +           const char *name,
>>>>> +           struct drm_device *ddev);
>>>>>      void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>    int drm_sched_job_init(struct drm_sched_job *job,
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7206e081871546cde52408d88f86a3c3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417155725505874%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NejMsBm%2Fk9gheoQJv29vIe9f59jelk12ViF9%2Bt2UUWU%3D&amp;reserved=0
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7206e081871546cde52408d88f86a3c3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417155725515872%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WtDcIF22HvCMJHObfEhLD%2F7%2BZ37%2FxQC1465YoOrMEjc%3D&amp;reserved=0
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-24  7:50             ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-24  7:50 UTC (permalink / raw)
  To: Luben Tuikov, christian.koenig, Andrey Grodzovsky, amd-gfx,
	dri-devel, daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 24.11.20 um 02:12 schrieb Luben Tuikov:
> On 2020-11-23 3:06 a.m., Christian König wrote:
>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>> No point to try recovery if device is gone, it's meaningless.
>>>> I think that this should go into the device specific recovery
>>>> function and not in the scheduler.
>>>
>>> The timeout timer is rearmed here, so this prevents any new recovery
>>> work to restart from here
>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>> cover other places like
>>> job cleanup or starting new job but those should stop once the
>>> scheduler thread is stopped later.
>> Yeah, but this is rather unclean. We should probably return an error
>> code instead if the timer should be rearmed or not.
> Christian, this is exactly my work I told you about
> last week on Wednesday in our weekly meeting. And
> which I wrote to you in an email last year about this
> time.

Yeah, that's why I'm suggesting it here as well.

> So what do we do now?

Split your patches into smaller parts and submit them chunk by chunk.

E.g. renames first and then functional changes grouped by area they change.

Regards,
Christian.

>
> I can submit those changes without the last part,
> which builds on this change.
>
> I'm still testing the last part and was hoping
> to submit it all in one sequence of patches,
> after my testing.
>
> Regards,
> Luben
>
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>> Christian.
>>>>
>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>    drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>    drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>    drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>    drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>    include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>    7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> index d56f402..d0b0021 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>> amdgpu_ring *ring,
>>>>>              r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>                       num_hw_submission, amdgpu_job_hang_limit,
>>>>> -                   timeout, ring->name);
>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>            if (r) {
>>>>>                DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>                      ring->name);
>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> index cd46c88..7678287 100644
>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>          ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>                     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>> +                 gpu->drm);
>>>>>        if (ret)
>>>>>            return ret;
>>>>>    diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>> *pipe, const char *name)
>>>>>          return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>                      lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>> -                  name);
>>>>> +                  name,
>>>>> +                  pipe->ldev->ddev);
>>>>>    }
>>>>>      void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> index 30e7b71..37b03b01 100644
>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>> *pfdev)
>>>>>            ret = drm_sched_init(&js->queue[j].sched,
>>>>>                         &panfrost_sched_ops,
>>>>>                         1, 0, msecs_to_jiffies(500),
>>>>> -                     "pan_js");
>>>>> +                     "pan_js", pfdev->ddev);
>>>>>            if (ret) {
>>>>>                dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>> ret);
>>>>>                goto err_sched;
>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> index c3f0bd0..95db8c6 100644
>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>> @@ -53,6 +53,7 @@
>>>>>    #include <drm/drm_print.h>
>>>>>    #include <drm/gpu_scheduler.h>
>>>>>    #include <drm/spsc_queue.h>
>>>>> +#include <drm/drm_drv.h>
>>>>>      #define CREATE_TRACE_POINTS
>>>>>    #include "gpu_scheduler_trace.h"
>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>> work_struct *work)
>>>>>        struct drm_gpu_scheduler *sched;
>>>>>        struct drm_sched_job *job;
>>>>>    +    int idx;
>>>>> +
>>>>>        sched = container_of(work, struct drm_gpu_scheduler,
>>>>> work_tdr.work);
>>>>>    +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>> scheduler:%s",
>>>>> +             __func__, sched->name);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>        /* Protects against concurrent deletion in
>>>>> drm_sched_get_cleanup_job */
>>>>>        spin_lock(&sched->job_list_lock);
>>>>>        job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>> work_struct *work)
>>>>>        spin_lock(&sched->job_list_lock);
>>>>>        drm_sched_start_timeout(sched);
>>>>>        spin_unlock(&sched->job_list_lock);
>>>>> +
>>>>> +    drm_dev_exit(idx);
>>>>>    }
>>>>>       /**
>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>               unsigned hw_submission,
>>>>>               unsigned hang_limit,
>>>>>               long timeout,
>>>>> -           const char *name)
>>>>> +           const char *name,
>>>>> +           struct drm_device *ddev)
>>>>>    {
>>>>>        int i, ret;
>>>>>        sched->ops = ops;
>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>        sched->name = name;
>>>>>        sched->timeout = timeout;
>>>>>        sched->hang_limit = hang_limit;
>>>>> +    sched->ddev = ddev;
>>>>>        for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>> i++)
>>>>>            drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>    diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> index 0747614..f5076e5 100644
>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                     &v3d_bin_sched_ops,
>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>> -                 "v3d_bin");
>>>>> +                 "v3d_bin",
>>>>> +                 &v3d->drm);
>>>>>        if (ret) {
>>>>>            dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>> %d.", ret);
>>>>>            return ret;
>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                     &v3d_render_sched_ops,
>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>> -                 "v3d_render");
>>>>> +                 "v3d_render",
>>>>> +                 &v3d->drm);
>>>>>        if (ret) {
>>>>>            dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>> %d.",
>>>>>                ret);
>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                     &v3d_tfu_sched_ops,
>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>> -                 "v3d_tfu");
>>>>> +                 "v3d_tfu",
>>>>> +                 &v3d->drm);
>>>>>        if (ret) {
>>>>>            dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>                ret);
>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                         &v3d_csd_sched_ops,
>>>>>                         hw_jobs_limit, job_hang_limit,
>>>>>                         msecs_to_jiffies(hang_limit_ms),
>>>>> -                     "v3d_csd");
>>>>> +                     "v3d_csd",
>>>>> +                     &v3d->drm);
>>>>>            if (ret) {
>>>>>                dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>> %d.",
>>>>>                    ret);
>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>                         &v3d_cache_clean_sched_ops,
>>>>>                         hw_jobs_limit, job_hang_limit,
>>>>>                         msecs_to_jiffies(hang_limit_ms),
>>>>> -                     "v3d_cache_clean");
>>>>> +                     "v3d_cache_clean",
>>>>> +                     &v3d->drm);
>>>>>            if (ret) {
>>>>>                dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>> scheduler: %d.",
>>>>>                    ret);
>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>> index 9243655..a980709 100644
>>>>> --- a/include/drm/gpu_scheduler.h
>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>> @@ -32,6 +32,7 @@
>>>>>      struct drm_gpu_scheduler;
>>>>>    struct drm_sched_rq;
>>>>> +struct drm_device;
>>>>>      /* These are often used as an (initial) index
>>>>>     * to an array, and as such should start at 0.
>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>     * @score: score to help loadbalancer pick a idle sched
>>>>>     * @ready: marks if the underlying HW is ready to work
>>>>>     * @free_guilty: A hit to time out handler to free the guilty job.
>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>     *
>>>>>     * One scheduler is implemented for each hardware ring.
>>>>>     */
>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>        atomic_t                        score;
>>>>>        bool                ready;
>>>>>        bool                free_guilty;
>>>>> +    struct drm_device        *ddev;
>>>>>    };
>>>>>      int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>               const struct drm_sched_backend_ops *ops,
>>>>>               uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>> -           const char *name);
>>>>> +           const char *name,
>>>>> +           struct drm_device *ddev);
>>>>>      void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>    int drm_sched_job_init(struct drm_sched_job *job,
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7206e081871546cde52408d88f86a3c3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417155725505874%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=NejMsBm%2Fk9gheoQJv29vIe9f59jelk12ViF9%2Bt2UUWU%3D&amp;reserved=0
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7206e081871546cde52408d88f86a3c3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417155725515872%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WtDcIF22HvCMJHObfEhLD%2F7%2BZ37%2FxQC1465YoOrMEjc%3D&amp;reserved=0
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 02/12] drm: Unamp the entire device address space on device unplug
  2020-11-21 14:16     ` Christian König
@ 2020-11-24 14:44       ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-24 14:44 UTC (permalink / raw)
  To: christian.koenig
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher, yuq825

On Sat, Nov 21, 2020 at 03:16:15PM +0100, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > Invalidate all BOs CPU mappings once device is removed.
> > 
> > v3: Move the code from TTM into drm_dev_unplug
> > 
> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> 
> Reviewed-by: Christian König <christian.koenig@amd.com>

Was wondering for a moment whether this should be in drm_dev_unregister
instead, but then it's only one part of the coin really. So

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> 
> > ---
> >   drivers/gpu/drm/drm_drv.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > index 13068fd..d550fd5 100644
> > --- a/drivers/gpu/drm/drm_drv.c
> > +++ b/drivers/gpu/drm/drm_drv.c
> > @@ -479,6 +479,9 @@ void drm_dev_unplug(struct drm_device *dev)
> >   	synchronize_srcu(&drm_unplug_srcu);
> >   	drm_dev_unregister(dev);
> > +
> > +	/* Clear all CPU mappings pointing to this device */
> > +	unmap_mapping_range(dev->anon_inode->i_mapping, 0, 0, 1);
> >   }
> >   EXPORT_SYMBOL(drm_dev_unplug);
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 02/12] drm: Unamp the entire device address space on device unplug
@ 2020-11-24 14:44       ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-24 14:44 UTC (permalink / raw)
  To: christian.koenig
  Cc: robh, Andrey Grodzovsky, daniel.vetter, dri-devel, eric,
	ppaalanen, amd-gfx, gregkh, Alexander.Deucher, yuq825,
	Harry.Wentland, l.stach

On Sat, Nov 21, 2020 at 03:16:15PM +0100, Christian König wrote:
> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > Invalidate all BOs CPU mappings once device is removed.
> > 
> > v3: Move the code from TTM into drm_dev_unplug
> > 
> > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> 
> Reviewed-by: Christian König <christian.koenig@amd.com>

Was wondering for a moment whether this should be in drm_dev_unregister
instead, but then it's only one part of the coin really. So

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> 
> > ---
> >   drivers/gpu/drm/drm_drv.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > index 13068fd..d550fd5 100644
> > --- a/drivers/gpu/drm/drm_drv.c
> > +++ b/drivers/gpu/drm/drm_drv.c
> > @@ -479,6 +479,9 @@ void drm_dev_unplug(struct drm_device *dev)
> >   	synchronize_srcu(&drm_unplug_srcu);
> >   	drm_dev_unregister(dev);
> > +
> > +	/* Clear all CPU mappings pointing to this device */
> > +	unmap_mapping_range(dev->anon_inode->i_mapping, 0, 0, 1);
> >   }
> >   EXPORT_SYMBOL(drm_dev_unplug);
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
  2020-11-21  5:21   ` Andrey Grodzovsky
@ 2020-11-24 14:49     ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-24 14:49 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: gregkh, ckoenig.leichtzumerken, dri-devel, amd-gfx,
	daniel.vetter, Alexander.Deucher, yuq825

On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
> Avoids NULL ptr due to kobj->sd being unset on device removal.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index caf828a..812e592 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -27,6 +27,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/reboot.h>
>  #include <linux/syscalls.h>
> +#include <drm/drm_drv.h>
>  
>  #include "amdgpu.h"
>  #include "amdgpu_ras.h"
> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
>  		.attrs = attrs,
>  	};
>  
> -	sysfs_remove_group(&adev->dev->kobj, &group);
> +	if (!drm_dev_is_unplugged(&adev->ddev))
> +		sysfs_remove_group(&adev->dev->kobj, &group);

This looks wrong. sysfs, like any other interface, should be
unconditionally thrown out when we do the drm_dev_unregister. Whether
hotunplugged or not should matter at all. Either this isn't needed at all,
or something is wrong with the ordering here. But definitely fishy.
-Daniel

>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> index 2b7c90b..54331fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> @@ -24,6 +24,7 @@
>  #include <linux/firmware.h>
>  #include <linux/slab.h>
>  #include <linux/module.h>
> +#include <drm/drm_drv.h>
>  
>  #include "amdgpu.h"
>  #include "amdgpu_ucode.h"
> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
>  
>  void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
>  {
> -	sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> +	if (!drm_dev_is_unplugged(&adev->ddev))
> +		sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>  }
>  
>  static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
@ 2020-11-24 14:49     ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-24 14:49 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: robh, gregkh, ckoenig.leichtzumerken, dri-devel, eric, ppaalanen,
	amd-gfx, daniel.vetter, Alexander.Deucher, yuq825,
	Harry.Wentland, l.stach

On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
> Avoids NULL ptr due to kobj->sd being unset on device removal.
> 
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index caf828a..812e592 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -27,6 +27,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/reboot.h>
>  #include <linux/syscalls.h>
> +#include <drm/drm_drv.h>
>  
>  #include "amdgpu.h"
>  #include "amdgpu_ras.h"
> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
>  		.attrs = attrs,
>  	};
>  
> -	sysfs_remove_group(&adev->dev->kobj, &group);
> +	if (!drm_dev_is_unplugged(&adev->ddev))
> +		sysfs_remove_group(&adev->dev->kobj, &group);

This looks wrong. sysfs, like any other interface, should be
unconditionally thrown out when we do the drm_dev_unregister. Whether
hotunplugged or not should matter at all. Either this isn't needed at all,
or something is wrong with the ordering here. But definitely fishy.
-Daniel

>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> index 2b7c90b..54331fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> @@ -24,6 +24,7 @@
>  #include <linux/firmware.h>
>  #include <linux/slab.h>
>  #include <linux/module.h>
> +#include <drm/drm_drv.h>
>  
>  #include "amdgpu.h"
>  #include "amdgpu_ucode.h"
> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
>  
>  void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
>  {
> -	sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> +	if (!drm_dev_is_unplugged(&adev->ddev))
> +		sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>  }
>  
>  static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
  2020-11-21  5:21   ` Andrey Grodzovsky
@ 2020-11-24 14:53     ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-24 14:53 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: gregkh, ckoenig.leichtzumerken, dri-devel, amd-gfx,
	daniel.vetter, Alexander.Deucher, yuq825

On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
> Some of the stuff in amdgpu_device_fini such as HW interrupts
> disable and pending fences finilization must be done right away on
> pci_remove while most of the stuff which relates to finilizing and
> releasing driver data structures can be kept until
> drm_driver.release hook is called, i.e. when the last device
> reference is dropped.
> 

Uh fini_late and fini_early are rathare meaningless namings, since no
clear why there's a split. If you used drm_connector_funcs as inspiration,
that's kinda not good because 'register' itself is a reserved keyword.
That's why we had to add late_ prefix, could as well have used
C_sucks_ as prefix :-) And then the early_unregister for consistency.

I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
about what they're doing.

I still strongly recommend that you cut over as much as possible of the
fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
-Daniel

> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
>  9 files changed, 65 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 83ac06a..6243f6d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
>  
>  int amdgpu_device_init(struct amdgpu_device *adev,
>  		       uint32_t flags);
> -void amdgpu_device_fini(struct amdgpu_device *adev);
> +void amdgpu_device_fini_early(struct amdgpu_device *adev);
> +void amdgpu_device_fini_late(struct amdgpu_device *adev);
> +
>  int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
>  
>  void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
> @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
>  int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
>  void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  				 struct drm_file *file_priv);
> +void amdgpu_driver_release_kms(struct drm_device *dev);
> +
>  int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>  int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>  int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 2f60b70..797d94d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   * Tear down the driver info (all asics).
>   * Called at driver shutdown.
>   */
> -void amdgpu_device_fini(struct amdgpu_device *adev)
> +void amdgpu_device_fini_early(struct amdgpu_device *adev)
>  {
>  	dev_info(adev->dev, "amdgpu: finishing device.\n");
>  	flush_delayed_work(&adev->delayed_init_work);
>  	adev->shutdown = true;
>  
> -	kfree(adev->pci_state);
> -
>  	/* make sure IB test finished before entering exclusive mode
>  	 * to avoid preemption on IB test
>  	 * */
> @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>  		else
>  			drm_atomic_helper_shutdown(adev_to_drm(adev));
>  	}
> -	amdgpu_fence_driver_fini(adev);
> +	amdgpu_fence_driver_fini_early(adev);
>  	if (adev->pm_sysfs_en)
>  		amdgpu_pm_sysfs_fini(adev);
>  	amdgpu_fbdev_fini(adev);
> +
> +	amdgpu_irq_fini_early(adev);
> +}
> +
> +void amdgpu_device_fini_late(struct amdgpu_device *adev)
> +{
>  	amdgpu_device_ip_fini(adev);
> +	amdgpu_fence_driver_fini_late(adev);
>  	release_firmware(adev->firmware.gpu_info_fw);
>  	adev->firmware.gpu_info_fw = NULL;
>  	adev->accel_working = false;
> @@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>  		amdgpu_pmu_fini(adev);
>  	if (adev->mman.discovery_bin)
>  		amdgpu_discovery_fini(adev);
> +
> +	kfree(adev->pci_state);
> +
>  }
>  
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 7f98cf1..3d130fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>  {
>  	struct drm_device *dev = pci_get_drvdata(pdev);
>  
> -#ifdef MODULE
> -	if (THIS_MODULE->state != MODULE_STATE_GOING)
> -#endif
> -		DRM_ERROR("Hotplug removal is not supported\n");
>  	drm_dev_unplug(dev);
>  	amdgpu_driver_unload_kms(dev);
> +
>  	pci_disable_device(pdev);
> -	pci_set_drvdata(pdev, NULL);
>  	drm_dev_put(dev);
>  }
>  
> @@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
>  	.dumb_create = amdgpu_mode_dumb_create,
>  	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>  	.fops = &amdgpu_driver_kms_fops,
> +	.release = &amdgpu_driver_release_kms,
>  
>  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index d0b0021..c123aa6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>   *
>   * Tear down the fence driver for all possible rings (all asics).
>   */
> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
>  {
>  	unsigned i, j;
>  	int r;
> @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>  		if (!ring->no_scheduler)
>  			drm_sched_fini(&ring->sched);
>  		del_timer_sync(&ring->fence_drv.fallback_timer);
> +	}
> +}
> +
> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
> +{
> +	unsigned int i, j;
> +
> +	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
> +		struct amdgpu_ring *ring = adev->rings[i];
> +
> +		if (!ring || !ring->fence_drv.initialized)
> +			continue;
> +
>  		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>  			dma_fence_put(ring->fence_drv.fences[j]);
>  		kfree(ring->fence_drv.fences);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 300ac73..a833197 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -49,6 +49,7 @@
>  #include <drm/drm_irq.h>
>  #include <drm/drm_vblank.h>
>  #include <drm/amdgpu_drm.h>
> +#include <drm/drm_drv.h>
>  #include "amdgpu.h"
>  #include "amdgpu_ih.h"
>  #include "atom.h"
> @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>  	return 0;
>  }
>  
> +
> +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
> +{
> +	if (adev->irq.installed) {
> +		drm_irq_uninstall(&adev->ddev);
> +		adev->irq.installed = false;
> +		if (adev->irq.msi_enabled)
> +			pci_free_irq_vectors(adev->pdev);
> +
> +		if (!amdgpu_device_has_dc_support(adev))
> +			flush_work(&adev->hotplug_work);
> +	}
> +}
> +
>  /**
>   * amdgpu_irq_fini - shut down interrupt handling
>   *
> @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>  {
>  	unsigned i, j;
>  
> -	if (adev->irq.installed) {
> -		drm_irq_uninstall(adev_to_drm(adev));
> -		adev->irq.installed = false;
> -		if (adev->irq.msi_enabled)
> -			pci_free_irq_vectors(adev->pdev);
> -		if (!amdgpu_device_has_dc_support(adev))
> -			flush_work(&adev->hotplug_work);
> -	}
> -
>  	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>  		if (!adev->irq.client[i].sources)
>  			continue;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> index c718e94..718c70f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
>  
>  int amdgpu_irq_init(struct amdgpu_device *adev);
>  void amdgpu_irq_fini(struct amdgpu_device *adev);
> +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
>  int amdgpu_irq_add_id(struct amdgpu_device *adev,
>  		      unsigned client_id, unsigned src_id,
>  		      struct amdgpu_irq_src *source);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index a0af8a7..9e30c5c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -29,6 +29,7 @@
>  #include "amdgpu.h"
>  #include <drm/drm_debugfs.h>
>  #include <drm/amdgpu_drm.h>
> +#include <drm/drm_drv.h>
>  #include "amdgpu_sched.h"
>  #include "amdgpu_uvd.h"
>  #include "amdgpu_vce.h"
> @@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>  	}
>  
>  	amdgpu_acpi_fini(adev);
> -	amdgpu_device_fini(adev);
> +	amdgpu_device_fini_early(adev);
>  }
>  
>  void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
> @@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  	pm_runtime_put_autosuspend(dev->dev);
>  }
>  
> +
> +void amdgpu_driver_release_kms(struct drm_device *dev)
> +{
> +	struct amdgpu_device *adev = drm_to_adev(dev);
> +
> +	amdgpu_device_fini_late(adev);
> +	pci_set_drvdata(adev->pdev, NULL);
> +}
> +
>  /*
>   * VBlank related functions.
>   */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 9d11b84..caf828a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
>  {
>  	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>  
> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
> +
>  	if (!con)
>  		return 0;
>  
> +
>  	/* Need disable ras on all IPs here before ip [hw/sw]fini */
>  	amdgpu_ras_disable_all_features(adev, 0);
>  	amdgpu_ras_recovery_fini(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 7112137..074f36b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
>  };
>  
>  int amdgpu_fence_driver_init(struct amdgpu_device *adev);
> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
>  void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
>  
>  int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
@ 2020-11-24 14:53     ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-24 14:53 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: robh, gregkh, ckoenig.leichtzumerken, dri-devel, eric, ppaalanen,
	amd-gfx, daniel.vetter, Alexander.Deucher, yuq825,
	Harry.Wentland, l.stach

On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
> Some of the stuff in amdgpu_device_fini such as HW interrupts
> disable and pending fences finilization must be done right away on
> pci_remove while most of the stuff which relates to finilizing and
> releasing driver data structures can be kept until
> drm_driver.release hook is called, i.e. when the last device
> reference is dropped.
> 

Uh fini_late and fini_early are rathare meaningless namings, since no
clear why there's a split. If you used drm_connector_funcs as inspiration,
that's kinda not good because 'register' itself is a reserved keyword.
That's why we had to add late_ prefix, could as well have used
C_sucks_ as prefix :-) And then the early_unregister for consistency.

I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
about what they're doing.

I still strongly recommend that you cut over as much as possible of the
fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
-Daniel

> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
>  9 files changed, 65 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 83ac06a..6243f6d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
>  
>  int amdgpu_device_init(struct amdgpu_device *adev,
>  		       uint32_t flags);
> -void amdgpu_device_fini(struct amdgpu_device *adev);
> +void amdgpu_device_fini_early(struct amdgpu_device *adev);
> +void amdgpu_device_fini_late(struct amdgpu_device *adev);
> +
>  int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
>  
>  void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
> @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
>  int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
>  void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  				 struct drm_file *file_priv);
> +void amdgpu_driver_release_kms(struct drm_device *dev);
> +
>  int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>  int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>  int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 2f60b70..797d94d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   * Tear down the driver info (all asics).
>   * Called at driver shutdown.
>   */
> -void amdgpu_device_fini(struct amdgpu_device *adev)
> +void amdgpu_device_fini_early(struct amdgpu_device *adev)
>  {
>  	dev_info(adev->dev, "amdgpu: finishing device.\n");
>  	flush_delayed_work(&adev->delayed_init_work);
>  	adev->shutdown = true;
>  
> -	kfree(adev->pci_state);
> -
>  	/* make sure IB test finished before entering exclusive mode
>  	 * to avoid preemption on IB test
>  	 * */
> @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>  		else
>  			drm_atomic_helper_shutdown(adev_to_drm(adev));
>  	}
> -	amdgpu_fence_driver_fini(adev);
> +	amdgpu_fence_driver_fini_early(adev);
>  	if (adev->pm_sysfs_en)
>  		amdgpu_pm_sysfs_fini(adev);
>  	amdgpu_fbdev_fini(adev);
> +
> +	amdgpu_irq_fini_early(adev);
> +}
> +
> +void amdgpu_device_fini_late(struct amdgpu_device *adev)
> +{
>  	amdgpu_device_ip_fini(adev);
> +	amdgpu_fence_driver_fini_late(adev);
>  	release_firmware(adev->firmware.gpu_info_fw);
>  	adev->firmware.gpu_info_fw = NULL;
>  	adev->accel_working = false;
> @@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>  		amdgpu_pmu_fini(adev);
>  	if (adev->mman.discovery_bin)
>  		amdgpu_discovery_fini(adev);
> +
> +	kfree(adev->pci_state);
> +
>  }
>  
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 7f98cf1..3d130fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>  {
>  	struct drm_device *dev = pci_get_drvdata(pdev);
>  
> -#ifdef MODULE
> -	if (THIS_MODULE->state != MODULE_STATE_GOING)
> -#endif
> -		DRM_ERROR("Hotplug removal is not supported\n");
>  	drm_dev_unplug(dev);
>  	amdgpu_driver_unload_kms(dev);
> +
>  	pci_disable_device(pdev);
> -	pci_set_drvdata(pdev, NULL);
>  	drm_dev_put(dev);
>  }
>  
> @@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
>  	.dumb_create = amdgpu_mode_dumb_create,
>  	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>  	.fops = &amdgpu_driver_kms_fops,
> +	.release = &amdgpu_driver_release_kms,
>  
>  	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>  	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index d0b0021..c123aa6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>   *
>   * Tear down the fence driver for all possible rings (all asics).
>   */
> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
>  {
>  	unsigned i, j;
>  	int r;
> @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>  		if (!ring->no_scheduler)
>  			drm_sched_fini(&ring->sched);
>  		del_timer_sync(&ring->fence_drv.fallback_timer);
> +	}
> +}
> +
> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
> +{
> +	unsigned int i, j;
> +
> +	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
> +		struct amdgpu_ring *ring = adev->rings[i];
> +
> +		if (!ring || !ring->fence_drv.initialized)
> +			continue;
> +
>  		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>  			dma_fence_put(ring->fence_drv.fences[j]);
>  		kfree(ring->fence_drv.fences);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> index 300ac73..a833197 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> @@ -49,6 +49,7 @@
>  #include <drm/drm_irq.h>
>  #include <drm/drm_vblank.h>
>  #include <drm/amdgpu_drm.h>
> +#include <drm/drm_drv.h>
>  #include "amdgpu.h"
>  #include "amdgpu_ih.h"
>  #include "atom.h"
> @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>  	return 0;
>  }
>  
> +
> +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
> +{
> +	if (adev->irq.installed) {
> +		drm_irq_uninstall(&adev->ddev);
> +		adev->irq.installed = false;
> +		if (adev->irq.msi_enabled)
> +			pci_free_irq_vectors(adev->pdev);
> +
> +		if (!amdgpu_device_has_dc_support(adev))
> +			flush_work(&adev->hotplug_work);
> +	}
> +}
> +
>  /**
>   * amdgpu_irq_fini - shut down interrupt handling
>   *
> @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>  {
>  	unsigned i, j;
>  
> -	if (adev->irq.installed) {
> -		drm_irq_uninstall(adev_to_drm(adev));
> -		adev->irq.installed = false;
> -		if (adev->irq.msi_enabled)
> -			pci_free_irq_vectors(adev->pdev);
> -		if (!amdgpu_device_has_dc_support(adev))
> -			flush_work(&adev->hotplug_work);
> -	}
> -
>  	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>  		if (!adev->irq.client[i].sources)
>  			continue;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> index c718e94..718c70f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
>  
>  int amdgpu_irq_init(struct amdgpu_device *adev);
>  void amdgpu_irq_fini(struct amdgpu_device *adev);
> +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
>  int amdgpu_irq_add_id(struct amdgpu_device *adev,
>  		      unsigned client_id, unsigned src_id,
>  		      struct amdgpu_irq_src *source);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index a0af8a7..9e30c5c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -29,6 +29,7 @@
>  #include "amdgpu.h"
>  #include <drm/drm_debugfs.h>
>  #include <drm/amdgpu_drm.h>
> +#include <drm/drm_drv.h>
>  #include "amdgpu_sched.h"
>  #include "amdgpu_uvd.h"
>  #include "amdgpu_vce.h"
> @@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>  	}
>  
>  	amdgpu_acpi_fini(adev);
> -	amdgpu_device_fini(adev);
> +	amdgpu_device_fini_early(adev);
>  }
>  
>  void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
> @@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>  	pm_runtime_put_autosuspend(dev->dev);
>  }
>  
> +
> +void amdgpu_driver_release_kms(struct drm_device *dev)
> +{
> +	struct amdgpu_device *adev = drm_to_adev(dev);
> +
> +	amdgpu_device_fini_late(adev);
> +	pci_set_drvdata(adev->pdev, NULL);
> +}
> +
>  /*
>   * VBlank related functions.
>   */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 9d11b84..caf828a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
>  {
>  	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>  
> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
> +
>  	if (!con)
>  		return 0;
>  
> +
>  	/* Need disable ras on all IPs here before ip [hw/sw]fini */
>  	amdgpu_ras_disable_all_features(adev, 0);
>  	amdgpu_ras_recovery_fini(adev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 7112137..074f36b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
>  };
>  
>  int amdgpu_fence_driver_init(struct amdgpu_device *adev);
> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
>  void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
>  
>  int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
> -- 
> 2.7.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
  2020-11-24 14:53     ` Daniel Vetter
@ 2020-11-24 15:51       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-24 15:51 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: gregkh, ckoenig.leichtzumerken, dri-devel, amd-gfx,
	daniel.vetter, Alexander.Deucher, yuq825


On 11/24/20 9:53 AM, Daniel Vetter wrote:
> On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
>> Some of the stuff in amdgpu_device_fini such as HW interrupts
>> disable and pending fences finilization must be done right away on
>> pci_remove while most of the stuff which relates to finilizing and
>> releasing driver data structures can be kept until
>> drm_driver.release hook is called, i.e. when the last device
>> reference is dropped.
>>
> Uh fini_late and fini_early are rathare meaningless namings, since no
> clear why there's a split. If you used drm_connector_funcs as inspiration,
> that's kinda not good because 'register' itself is a reserved keyword.
> That's why we had to add late_ prefix, could as well have used
> C_sucks_ as prefix :-) And then the early_unregister for consistency.
>
> I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
> about what they're doing.
>
> I still strongly recommend that you cut over as much as possible of the
> fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
> -Daniel


Definitely, and I put it in a TODO list in the RFC patch.Also, as I mentioned 
before -
I just prefer to leave it for a follow up work because it's non trivial and 
requires shuffling
a lof of stuff around in the driver. I was thinking of committing the work in 
incremental steps -
so it's easier to merge it and control for breakages.

Andrey


>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
>>   9 files changed, 65 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 83ac06a..6243f6d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
>>   
>>   int amdgpu_device_init(struct amdgpu_device *adev,
>>   		       uint32_t flags);
>> -void amdgpu_device_fini(struct amdgpu_device *adev);
>> +void amdgpu_device_fini_early(struct amdgpu_device *adev);
>> +void amdgpu_device_fini_late(struct amdgpu_device *adev);
>> +
>>   int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
>>   
>>   void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
>> @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
>>   int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
>>   void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>   				 struct drm_file *file_priv);
>> +void amdgpu_driver_release_kms(struct drm_device *dev);
>> +
>>   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>>   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>>   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 2f60b70..797d94d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>    * Tear down the driver info (all asics).
>>    * Called at driver shutdown.
>>    */
>> -void amdgpu_device_fini(struct amdgpu_device *adev)
>> +void amdgpu_device_fini_early(struct amdgpu_device *adev)
>>   {
>>   	dev_info(adev->dev, "amdgpu: finishing device.\n");
>>   	flush_delayed_work(&adev->delayed_init_work);
>>   	adev->shutdown = true;
>>   
>> -	kfree(adev->pci_state);
>> -
>>   	/* make sure IB test finished before entering exclusive mode
>>   	 * to avoid preemption on IB test
>>   	 * */
>> @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>   		else
>>   			drm_atomic_helper_shutdown(adev_to_drm(adev));
>>   	}
>> -	amdgpu_fence_driver_fini(adev);
>> +	amdgpu_fence_driver_fini_early(adev);
>>   	if (adev->pm_sysfs_en)
>>   		amdgpu_pm_sysfs_fini(adev);
>>   	amdgpu_fbdev_fini(adev);
>> +
>> +	amdgpu_irq_fini_early(adev);
>> +}
>> +
>> +void amdgpu_device_fini_late(struct amdgpu_device *adev)
>> +{
>>   	amdgpu_device_ip_fini(adev);
>> +	amdgpu_fence_driver_fini_late(adev);
>>   	release_firmware(adev->firmware.gpu_info_fw);
>>   	adev->firmware.gpu_info_fw = NULL;
>>   	adev->accel_working = false;
>> @@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>   		amdgpu_pmu_fini(adev);
>>   	if (adev->mman.discovery_bin)
>>   		amdgpu_discovery_fini(adev);
>> +
>> +	kfree(adev->pci_state);
>> +
>>   }
>>   
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 7f98cf1..3d130fc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>>   {
>>   	struct drm_device *dev = pci_get_drvdata(pdev);
>>   
>> -#ifdef MODULE
>> -	if (THIS_MODULE->state != MODULE_STATE_GOING)
>> -#endif
>> -		DRM_ERROR("Hotplug removal is not supported\n");
>>   	drm_dev_unplug(dev);
>>   	amdgpu_driver_unload_kms(dev);
>> +
>>   	pci_disable_device(pdev);
>> -	pci_set_drvdata(pdev, NULL);
>>   	drm_dev_put(dev);
>>   }
>>   
>> @@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
>>   	.dumb_create = amdgpu_mode_dumb_create,
>>   	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>>   	.fops = &amdgpu_driver_kms_fops,
>> +	.release = &amdgpu_driver_release_kms,
>>   
>>   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>>   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index d0b0021..c123aa6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>>    *
>>    * Tear down the fence driver for all possible rings (all asics).
>>    */
>> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
>>   {
>>   	unsigned i, j;
>>   	int r;
>> @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>   		if (!ring->no_scheduler)
>>   			drm_sched_fini(&ring->sched);
>>   		del_timer_sync(&ring->fence_drv.fallback_timer);
>> +	}
>> +}
>> +
>> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
>> +{
>> +	unsigned int i, j;
>> +
>> +	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
>> +		struct amdgpu_ring *ring = adev->rings[i];
>> +
>> +		if (!ring || !ring->fence_drv.initialized)
>> +			continue;
>> +
>>   		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>   			dma_fence_put(ring->fence_drv.fences[j]);
>>   		kfree(ring->fence_drv.fences);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index 300ac73..a833197 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -49,6 +49,7 @@
>>   #include <drm/drm_irq.h>
>>   #include <drm/drm_vblank.h>
>>   #include <drm/amdgpu_drm.h>
>> +#include <drm/drm_drv.h>
>>   #include "amdgpu.h"
>>   #include "amdgpu_ih.h"
>>   #include "atom.h"
>> @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>   	return 0;
>>   }
>>   
>> +
>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
>> +{
>> +	if (adev->irq.installed) {
>> +		drm_irq_uninstall(&adev->ddev);
>> +		adev->irq.installed = false;
>> +		if (adev->irq.msi_enabled)
>> +			pci_free_irq_vectors(adev->pdev);
>> +
>> +		if (!amdgpu_device_has_dc_support(adev))
>> +			flush_work(&adev->hotplug_work);
>> +	}
>> +}
>> +
>>   /**
>>    * amdgpu_irq_fini - shut down interrupt handling
>>    *
>> @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>>   {
>>   	unsigned i, j;
>>   
>> -	if (adev->irq.installed) {
>> -		drm_irq_uninstall(adev_to_drm(adev));
>> -		adev->irq.installed = false;
>> -		if (adev->irq.msi_enabled)
>> -			pci_free_irq_vectors(adev->pdev);
>> -		if (!amdgpu_device_has_dc_support(adev))
>> -			flush_work(&adev->hotplug_work);
>> -	}
>> -
>>   	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>>   		if (!adev->irq.client[i].sources)
>>   			continue;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> index c718e94..718c70f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
>>   
>>   int amdgpu_irq_init(struct amdgpu_device *adev);
>>   void amdgpu_irq_fini(struct amdgpu_device *adev);
>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
>>   int amdgpu_irq_add_id(struct amdgpu_device *adev,
>>   		      unsigned client_id, unsigned src_id,
>>   		      struct amdgpu_irq_src *source);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index a0af8a7..9e30c5c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -29,6 +29,7 @@
>>   #include "amdgpu.h"
>>   #include <drm/drm_debugfs.h>
>>   #include <drm/amdgpu_drm.h>
>> +#include <drm/drm_drv.h>
>>   #include "amdgpu_sched.h"
>>   #include "amdgpu_uvd.h"
>>   #include "amdgpu_vce.h"
>> @@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>   	}
>>   
>>   	amdgpu_acpi_fini(adev);
>> -	amdgpu_device_fini(adev);
>> +	amdgpu_device_fini_early(adev);
>>   }
>>   
>>   void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
>> @@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>   	pm_runtime_put_autosuspend(dev->dev);
>>   }
>>   
>> +
>> +void amdgpu_driver_release_kms(struct drm_device *dev)
>> +{
>> +	struct amdgpu_device *adev = drm_to_adev(dev);
>> +
>> +	amdgpu_device_fini_late(adev);
>> +	pci_set_drvdata(adev->pdev, NULL);
>> +}
>> +
>>   /*
>>    * VBlank related functions.
>>    */
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> index 9d11b84..caf828a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> @@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
>>   {
>>   	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>>   
>> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
>> +
>>   	if (!con)
>>   		return 0;
>>   
>> +
>>   	/* Need disable ras on all IPs here before ip [hw/sw]fini */
>>   	amdgpu_ras_disable_all_features(adev, 0);
>>   	amdgpu_ras_recovery_fini(adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index 7112137..074f36b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
>>   };
>>   
>>   int amdgpu_fence_driver_init(struct amdgpu_device *adev);
>> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
>> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
>> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
>>   void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
>>   
>>   int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>> -- 
>> 2.7.4
>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
@ 2020-11-24 15:51       ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-24 15:51 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, gregkh, ckoenig.leichtzumerken, dri-devel, eric, ppaalanen,
	amd-gfx, daniel.vetter, Alexander.Deucher, yuq825,
	Harry.Wentland, l.stach


On 11/24/20 9:53 AM, Daniel Vetter wrote:
> On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
>> Some of the stuff in amdgpu_device_fini such as HW interrupts
>> disable and pending fences finilization must be done right away on
>> pci_remove while most of the stuff which relates to finilizing and
>> releasing driver data structures can be kept until
>> drm_driver.release hook is called, i.e. when the last device
>> reference is dropped.
>>
> Uh fini_late and fini_early are rathare meaningless namings, since no
> clear why there's a split. If you used drm_connector_funcs as inspiration,
> that's kinda not good because 'register' itself is a reserved keyword.
> That's why we had to add late_ prefix, could as well have used
> C_sucks_ as prefix :-) And then the early_unregister for consistency.
>
> I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
> about what they're doing.
>
> I still strongly recommend that you cut over as much as possible of the
> fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
> -Daniel


Definitely, and I put it in a TODO list in the RFC patch.Also, as I mentioned 
before -
I just prefer to leave it for a follow up work because it's non trivial and 
requires shuffling
a lof of stuff around in the driver. I was thinking of committing the work in 
incremental steps -
so it's easier to merge it and control for breakages.

Andrey


>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
>>   9 files changed, 65 insertions(+), 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 83ac06a..6243f6d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
>>   
>>   int amdgpu_device_init(struct amdgpu_device *adev,
>>   		       uint32_t flags);
>> -void amdgpu_device_fini(struct amdgpu_device *adev);
>> +void amdgpu_device_fini_early(struct amdgpu_device *adev);
>> +void amdgpu_device_fini_late(struct amdgpu_device *adev);
>> +
>>   int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
>>   
>>   void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
>> @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
>>   int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
>>   void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>   				 struct drm_file *file_priv);
>> +void amdgpu_driver_release_kms(struct drm_device *dev);
>> +
>>   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>>   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>>   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 2f60b70..797d94d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>    * Tear down the driver info (all asics).
>>    * Called at driver shutdown.
>>    */
>> -void amdgpu_device_fini(struct amdgpu_device *adev)
>> +void amdgpu_device_fini_early(struct amdgpu_device *adev)
>>   {
>>   	dev_info(adev->dev, "amdgpu: finishing device.\n");
>>   	flush_delayed_work(&adev->delayed_init_work);
>>   	adev->shutdown = true;
>>   
>> -	kfree(adev->pci_state);
>> -
>>   	/* make sure IB test finished before entering exclusive mode
>>   	 * to avoid preemption on IB test
>>   	 * */
>> @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>   		else
>>   			drm_atomic_helper_shutdown(adev_to_drm(adev));
>>   	}
>> -	amdgpu_fence_driver_fini(adev);
>> +	amdgpu_fence_driver_fini_early(adev);
>>   	if (adev->pm_sysfs_en)
>>   		amdgpu_pm_sysfs_fini(adev);
>>   	amdgpu_fbdev_fini(adev);
>> +
>> +	amdgpu_irq_fini_early(adev);
>> +}
>> +
>> +void amdgpu_device_fini_late(struct amdgpu_device *adev)
>> +{
>>   	amdgpu_device_ip_fini(adev);
>> +	amdgpu_fence_driver_fini_late(adev);
>>   	release_firmware(adev->firmware.gpu_info_fw);
>>   	adev->firmware.gpu_info_fw = NULL;
>>   	adev->accel_working = false;
>> @@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>   		amdgpu_pmu_fini(adev);
>>   	if (adev->mman.discovery_bin)
>>   		amdgpu_discovery_fini(adev);
>> +
>> +	kfree(adev->pci_state);
>> +
>>   }
>>   
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 7f98cf1..3d130fc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>>   {
>>   	struct drm_device *dev = pci_get_drvdata(pdev);
>>   
>> -#ifdef MODULE
>> -	if (THIS_MODULE->state != MODULE_STATE_GOING)
>> -#endif
>> -		DRM_ERROR("Hotplug removal is not supported\n");
>>   	drm_dev_unplug(dev);
>>   	amdgpu_driver_unload_kms(dev);
>> +
>>   	pci_disable_device(pdev);
>> -	pci_set_drvdata(pdev, NULL);
>>   	drm_dev_put(dev);
>>   }
>>   
>> @@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
>>   	.dumb_create = amdgpu_mode_dumb_create,
>>   	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>>   	.fops = &amdgpu_driver_kms_fops,
>> +	.release = &amdgpu_driver_release_kms,
>>   
>>   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>>   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index d0b0021..c123aa6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>>    *
>>    * Tear down the fence driver for all possible rings (all asics).
>>    */
>> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
>>   {
>>   	unsigned i, j;
>>   	int r;
>> @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>   		if (!ring->no_scheduler)
>>   			drm_sched_fini(&ring->sched);
>>   		del_timer_sync(&ring->fence_drv.fallback_timer);
>> +	}
>> +}
>> +
>> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
>> +{
>> +	unsigned int i, j;
>> +
>> +	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
>> +		struct amdgpu_ring *ring = adev->rings[i];
>> +
>> +		if (!ring || !ring->fence_drv.initialized)
>> +			continue;
>> +
>>   		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>   			dma_fence_put(ring->fence_drv.fences[j]);
>>   		kfree(ring->fence_drv.fences);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> index 300ac73..a833197 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>> @@ -49,6 +49,7 @@
>>   #include <drm/drm_irq.h>
>>   #include <drm/drm_vblank.h>
>>   #include <drm/amdgpu_drm.h>
>> +#include <drm/drm_drv.h>
>>   #include "amdgpu.h"
>>   #include "amdgpu_ih.h"
>>   #include "atom.h"
>> @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>   	return 0;
>>   }
>>   
>> +
>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
>> +{
>> +	if (adev->irq.installed) {
>> +		drm_irq_uninstall(&adev->ddev);
>> +		adev->irq.installed = false;
>> +		if (adev->irq.msi_enabled)
>> +			pci_free_irq_vectors(adev->pdev);
>> +
>> +		if (!amdgpu_device_has_dc_support(adev))
>> +			flush_work(&adev->hotplug_work);
>> +	}
>> +}
>> +
>>   /**
>>    * amdgpu_irq_fini - shut down interrupt handling
>>    *
>> @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>>   {
>>   	unsigned i, j;
>>   
>> -	if (adev->irq.installed) {
>> -		drm_irq_uninstall(adev_to_drm(adev));
>> -		adev->irq.installed = false;
>> -		if (adev->irq.msi_enabled)
>> -			pci_free_irq_vectors(adev->pdev);
>> -		if (!amdgpu_device_has_dc_support(adev))
>> -			flush_work(&adev->hotplug_work);
>> -	}
>> -
>>   	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>>   		if (!adev->irq.client[i].sources)
>>   			continue;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> index c718e94..718c70f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>> @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
>>   
>>   int amdgpu_irq_init(struct amdgpu_device *adev);
>>   void amdgpu_irq_fini(struct amdgpu_device *adev);
>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
>>   int amdgpu_irq_add_id(struct amdgpu_device *adev,
>>   		      unsigned client_id, unsigned src_id,
>>   		      struct amdgpu_irq_src *source);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index a0af8a7..9e30c5c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -29,6 +29,7 @@
>>   #include "amdgpu.h"
>>   #include <drm/drm_debugfs.h>
>>   #include <drm/amdgpu_drm.h>
>> +#include <drm/drm_drv.h>
>>   #include "amdgpu_sched.h"
>>   #include "amdgpu_uvd.h"
>>   #include "amdgpu_vce.h"
>> @@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>   	}
>>   
>>   	amdgpu_acpi_fini(adev);
>> -	amdgpu_device_fini(adev);
>> +	amdgpu_device_fini_early(adev);
>>   }
>>   
>>   void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
>> @@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>   	pm_runtime_put_autosuspend(dev->dev);
>>   }
>>   
>> +
>> +void amdgpu_driver_release_kms(struct drm_device *dev)
>> +{
>> +	struct amdgpu_device *adev = drm_to_adev(dev);
>> +
>> +	amdgpu_device_fini_late(adev);
>> +	pci_set_drvdata(adev->pdev, NULL);
>> +}
>> +
>>   /*
>>    * VBlank related functions.
>>    */
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> index 9d11b84..caf828a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> @@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
>>   {
>>   	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>>   
>> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
>> +
>>   	if (!con)
>>   		return 0;
>>   
>> +
>>   	/* Need disable ras on all IPs here before ip [hw/sw]fini */
>>   	amdgpu_ras_disable_all_features(adev, 0);
>>   	amdgpu_ras_recovery_fini(adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index 7112137..074f36b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
>>   };
>>   
>>   int amdgpu_fence_driver_init(struct amdgpu_device *adev);
>> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
>> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
>> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
>>   void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
>>   
>>   int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>> -- 
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-24  7:41                 ` Christian König
@ 2020-11-24 16:22                   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-24 16:22 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/24/20 2:41 AM, Christian König wrote:
> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>
>> On 11/23/20 3:41 PM, Christian König wrote:
>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>
>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>> before device's IOMMU group is released.
>>>>>>>
>>>>>>> It would be cleaner if we could do the whole handling in TTM. I also 
>>>>>>> need to double check what you are doing with this function.
>>>>>>>
>>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to 
>>>>>> see
>>>>>> how i use it. I don't see why this should go into TTM mid-layer - the 
>>>>>> stuff I do inside
>>>>>> is vendor specific and also I don't think TTM is explicitly aware of IOMMU ?
>>>>>> Do you mean you prefer the IOMMU notifier to be registered from within TTM
>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>
>>>>> No, that is really vendor specific.
>>>>>
>>>>> What I meant is to have a function like ttm_resource_manager_evict_all() 
>>>>> which you only need to call and all tt objects are unpopulated.
>>>>
>>>>
>>>> So instead of this BO list i create and later iterate in amdgpu from the 
>>>> IOMMU patch you just want to do it within
>>>> TTM with a single function ? Makes much more sense.
>>>
>>> Yes, exactly.
>>>
>>> The list_empty() checks we have in TTM for the LRU are actually not the best 
>>> idea, we should now check the pin_count instead. This way we could also have 
>>> a list of the pinned BOs in TTM.
>>
>>
>> So from my IOMMU topology handler I will iterate the TTM LRU for the unpinned 
>> BOs and this new function for the pinned ones  ?
>> It's probably a good idea to combine both iterations into this new function 
>> to cover all the BOs allocated on the device.
>
> Yes, that's what I had in my mind as well.
>
>>
>>
>>>
>>> BTW: Have you thought about what happens when we unpopulate a BO while we 
>>> still try to use a kernel mapping for it? That could have unforeseen 
>>> consequences.
>>
>>
>> Are you asking what happens to kmap or vmap style mapped CPU accesses once we 
>> drop all the DMA backing pages for a particular BO ? Because for user mappings
>> (mmap) we took care of this with dummy page reroute but indeed nothing was 
>> done for in kernel CPU mappings.
>
> Yes exactly that.
>
> In other words what happens if we free the ring buffer while the kernel still 
> writes to it?
>
> Christian.


While we can't control user application accesses to the mapped buffers 
explicitly and hence we use page fault rerouting
I am thinking that in this  case we may be able to sprinkle drm_dev_enter/exit 
in any such sensitive place were we might
CPU access a DMA buffer from the kernel ? Things like CPU page table updates, 
ring buffer accesses and FW memcpy ? Is there other places ?
Another point is that at this point the driver shouldn't access any such buffers 
as we are at the process finishing the device.
AFAIK there is no page fault mechanism for kernel mappings so I don't think 
there is anything else to do ?

Andrey


>
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Give me a day or two to look into this.
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>>>>>   1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>>> index 1ccf1ef..29248a5 100644
>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>>>>>       else
>>>>>>>>           ttm_pool_unpopulate(ttm);
>>>>>>>>   }
>>>>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>>>>>
>>>>>
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-24 16:22                   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-24 16:22 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland


On 11/24/20 2:41 AM, Christian König wrote:
> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>
>> On 11/23/20 3:41 PM, Christian König wrote:
>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>
>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>> before device's IOMMU group is released.
>>>>>>>
>>>>>>> It would be cleaner if we could do the whole handling in TTM. I also 
>>>>>>> need to double check what you are doing with this function.
>>>>>>>
>>>>>>> Christian.
>>>>>>
>>>>>>
>>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." to 
>>>>>> see
>>>>>> how i use it. I don't see why this should go into TTM mid-layer - the 
>>>>>> stuff I do inside
>>>>>> is vendor specific and also I don't think TTM is explicitly aware of IOMMU ?
>>>>>> Do you mean you prefer the IOMMU notifier to be registered from within TTM
>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>
>>>>> No, that is really vendor specific.
>>>>>
>>>>> What I meant is to have a function like ttm_resource_manager_evict_all() 
>>>>> which you only need to call and all tt objects are unpopulated.
>>>>
>>>>
>>>> So instead of this BO list i create and later iterate in amdgpu from the 
>>>> IOMMU patch you just want to do it within
>>>> TTM with a single function ? Makes much more sense.
>>>
>>> Yes, exactly.
>>>
>>> The list_empty() checks we have in TTM for the LRU are actually not the best 
>>> idea, we should now check the pin_count instead. This way we could also have 
>>> a list of the pinned BOs in TTM.
>>
>>
>> So from my IOMMU topology handler I will iterate the TTM LRU for the unpinned 
>> BOs and this new function for the pinned ones  ?
>> It's probably a good idea to combine both iterations into this new function 
>> to cover all the BOs allocated on the device.
>
> Yes, that's what I had in my mind as well.
>
>>
>>
>>>
>>> BTW: Have you thought about what happens when we unpopulate a BO while we 
>>> still try to use a kernel mapping for it? That could have unforeseen 
>>> consequences.
>>
>>
>> Are you asking what happens to kmap or vmap style mapped CPU accesses once we 
>> drop all the DMA backing pages for a particular BO ? Because for user mappings
>> (mmap) we took care of this with dummy page reroute but indeed nothing was 
>> done for in kernel CPU mappings.
>
> Yes exactly that.
>
> In other words what happens if we free the ring buffer while the kernel still 
> writes to it?
>
> Christian.


While we can't control user application accesses to the mapped buffers 
explicitly and hence we use page fault rerouting
I am thinking that in this  case we may be able to sprinkle drm_dev_enter/exit 
in any such sensitive place were we might
CPU access a DMA buffer from the kernel ? Things like CPU page table updates, 
ring buffer accesses and FW memcpy ? Is there other places ?
Another point is that at this point the driver shouldn't access any such buffers 
as we are at the process finishing the device.
AFAIK there is no page fault mechanism for kernel mappings so I don't think 
there is anything else to do ?

Andrey


>
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>
>>>>> Give me a day or two to look into this.
>>>>>
>>>>> Christian.
>>>>>
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>>>>>>>>   1 file changed, 1 insertion(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>>> index 1ccf1ef..29248a5 100644
>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
>>>>>>>> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>>>>>>>>       else
>>>>>>>>           ttm_pool_unpopulate(ttm);
>>>>>>>>   }
>>>>>>>> +EXPORT_SYMBOL(ttm_tt_unpopulate);
>>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C9be029f26a4746347a6108d88fed299b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417596065559955%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tZ3do%2FeKzBtRlNaFbBjCtRvUHKdvwDZ7SoYhEBu4%2BT8%3D&amp;reserved=0 
>>>>>>
>>>>>
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-24 16:22                   ` Andrey Grodzovsky
@ 2020-11-24 16:44                     ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-24 16:44 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>
> On 11/24/20 2:41 AM, Christian König wrote:
>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>
>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>
>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>
>>>>>>>> It would be cleaner if we could do the whole handling in TTM. I 
>>>>>>>> also need to double check what you are doing with this function.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per 
>>>>>>> device." to see
>>>>>>> how i use it. I don't see why this should go into TTM mid-layer 
>>>>>>> - the stuff I do inside
>>>>>>> is vendor specific and also I don't think TTM is explicitly 
>>>>>>> aware of IOMMU ?
>>>>>>> Do you mean you prefer the IOMMU notifier to be registered from 
>>>>>>> within TTM
>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>
>>>>>> No, that is really vendor specific.
>>>>>>
>>>>>> What I meant is to have a function like 
>>>>>> ttm_resource_manager_evict_all() which you only need to call and 
>>>>>> all tt objects are unpopulated.
>>>>>
>>>>>
>>>>> So instead of this BO list i create and later iterate in amdgpu 
>>>>> from the IOMMU patch you just want to do it within
>>>>> TTM with a single function ? Makes much more sense.
>>>>
>>>> Yes, exactly.
>>>>
>>>> The list_empty() checks we have in TTM for the LRU are actually not 
>>>> the best idea, we should now check the pin_count instead. This way 
>>>> we could also have a list of the pinned BOs in TTM.
>>>
>>>
>>> So from my IOMMU topology handler I will iterate the TTM LRU for the 
>>> unpinned BOs and this new function for the pinned ones  ?
>>> It's probably a good idea to combine both iterations into this new 
>>> function to cover all the BOs allocated on the device.
>>
>> Yes, that's what I had in my mind as well.
>>
>>>
>>>
>>>>
>>>> BTW: Have you thought about what happens when we unpopulate a BO 
>>>> while we still try to use a kernel mapping for it? That could have 
>>>> unforeseen consequences.
>>>
>>>
>>> Are you asking what happens to kmap or vmap style mapped CPU 
>>> accesses once we drop all the DMA backing pages for a particular BO 
>>> ? Because for user mappings
>>> (mmap) we took care of this with dummy page reroute but indeed 
>>> nothing was done for in kernel CPU mappings.
>>
>> Yes exactly that.
>>
>> In other words what happens if we free the ring buffer while the 
>> kernel still writes to it?
>>
>> Christian.
>
>
> While we can't control user application accesses to the mapped buffers 
> explicitly and hence we use page fault rerouting
> I am thinking that in this  case we may be able to sprinkle 
> drm_dev_enter/exit in any such sensitive place were we might
> CPU access a DMA buffer from the kernel ?

Yes, I fear we are going to need that.

> Things like CPU page table updates, ring buffer accesses and FW memcpy 
> ? Is there other places ?

Puh, good question. I have no idea.

> Another point is that at this point the driver shouldn't access any 
> such buffers as we are at the process finishing the device.
> AFAIK there is no page fault mechanism for kernel mappings so I don't 
> think there is anything else to do ?

Well there is a page fault handler for kernel mappings, but that one 
just prints the stack trace into the system log and calls BUG(); :)

Long story short we need to avoid any access to released pages after 
unplug. No matter if it's from the kernel or userspace.

Regards,
Christian.

>
> Andrey

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-24 16:44                     ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-24 16:44 UTC (permalink / raw)
  To: Andrey Grodzovsky, Christian König, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>
> On 11/24/20 2:41 AM, Christian König wrote:
>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>
>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>
>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>
>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>
>>>>>>>> It would be cleaner if we could do the whole handling in TTM. I 
>>>>>>>> also need to double check what you are doing with this function.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>>
>>>>>>>
>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per 
>>>>>>> device." to see
>>>>>>> how i use it. I don't see why this should go into TTM mid-layer 
>>>>>>> - the stuff I do inside
>>>>>>> is vendor specific and also I don't think TTM is explicitly 
>>>>>>> aware of IOMMU ?
>>>>>>> Do you mean you prefer the IOMMU notifier to be registered from 
>>>>>>> within TTM
>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>
>>>>>> No, that is really vendor specific.
>>>>>>
>>>>>> What I meant is to have a function like 
>>>>>> ttm_resource_manager_evict_all() which you only need to call and 
>>>>>> all tt objects are unpopulated.
>>>>>
>>>>>
>>>>> So instead of this BO list i create and later iterate in amdgpu 
>>>>> from the IOMMU patch you just want to do it within
>>>>> TTM with a single function ? Makes much more sense.
>>>>
>>>> Yes, exactly.
>>>>
>>>> The list_empty() checks we have in TTM for the LRU are actually not 
>>>> the best idea, we should now check the pin_count instead. This way 
>>>> we could also have a list of the pinned BOs in TTM.
>>>
>>>
>>> So from my IOMMU topology handler I will iterate the TTM LRU for the 
>>> unpinned BOs and this new function for the pinned ones  ?
>>> It's probably a good idea to combine both iterations into this new 
>>> function to cover all the BOs allocated on the device.
>>
>> Yes, that's what I had in my mind as well.
>>
>>>
>>>
>>>>
>>>> BTW: Have you thought about what happens when we unpopulate a BO 
>>>> while we still try to use a kernel mapping for it? That could have 
>>>> unforeseen consequences.
>>>
>>>
>>> Are you asking what happens to kmap or vmap style mapped CPU 
>>> accesses once we drop all the DMA backing pages for a particular BO 
>>> ? Because for user mappings
>>> (mmap) we took care of this with dummy page reroute but indeed 
>>> nothing was done for in kernel CPU mappings.
>>
>> Yes exactly that.
>>
>> In other words what happens if we free the ring buffer while the 
>> kernel still writes to it?
>>
>> Christian.
>
>
> While we can't control user application accesses to the mapped buffers 
> explicitly and hence we use page fault rerouting
> I am thinking that in this  case we may be able to sprinkle 
> drm_dev_enter/exit in any such sensitive place were we might
> CPU access a DMA buffer from the kernel ?

Yes, I fear we are going to need that.

> Things like CPU page table updates, ring buffer accesses and FW memcpy 
> ? Is there other places ?

Puh, good question. I have no idea.

> Another point is that at this point the driver shouldn't access any 
> such buffers as we are at the process finishing the device.
> AFAIK there is no page fault mechanism for kernel mappings so I don't 
> think there is anything else to do ?

Well there is a page fault handler for kernel mappings, but that one 
just prints the stack trace into the system log and calls BUG(); :)

Long story short we need to avoid any access to released pages after 
unplug. No matter if it's from the kernel or userspace.

Regards,
Christian.

>
> Andrey

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-24  7:50             ` Christian König
@ 2020-11-24 17:11               ` Luben Tuikov
  -1 siblings, 0 replies; 212+ messages in thread
From: Luben Tuikov @ 2020-11-24 17:11 UTC (permalink / raw)
  To: christian.koenig, Andrey Grodzovsky, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

On 2020-11-24 2:50 a.m., Christian König wrote:
> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>> I think that this should go into the device specific recovery
>>>>> function and not in the scheduler.
>>>>
>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>> work to restart from here
>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>> cover other places like
>>>> job cleanup or starting new job but those should stop once the
>>>> scheduler thread is stopped later.
>>> Yeah, but this is rather unclean. We should probably return an error
>>> code instead if the timer should be rearmed or not.
>> Christian, this is exactly my work I told you about
>> last week on Wednesday in our weekly meeting. And
>> which I wrote to you in an email last year about this
>> time.
> 
> Yeah, that's why I'm suggesting it here as well.

It seems you're suggesting that Andrey do it, while
all too well you know I've been working on this
for some time now.

I wrote you about this last year same time
in an email. And I discussed it on the Wednesday
meeting.

You could've mentioned that here the first time.

> 
>> So what do we do now?
> 
> Split your patches into smaller parts and submit them chunk by chunk.
> 
> E.g. renames first and then functional changes grouped by area they change.

I have, but my final patch, a tiny one but which implements
the core reason for the change seems buggy, and I'm looking
for a way to debug it.

Regards,
Luben


> 
> Regards,
> Christian.
> 
>>
>> I can submit those changes without the last part,
>> which builds on this change.
>>
>> I'm still testing the last part and was hoping
>> to submit it all in one sequence of patches,
>> after my testing.
>>
>> Regards,
>> Luben
>>
>>> Christian.
>>>
>>>> Andrey
>>>>
>>>>
>>>>> Christian.
>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>    drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>    drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>    drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>    drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>    include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>    7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> index d56f402..d0b0021 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>> amdgpu_ring *ring,
>>>>>>              r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>                       num_hw_submission, amdgpu_job_hang_limit,
>>>>>> -                   timeout, ring->name);
>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>            if (r) {
>>>>>>                DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>                      ring->name);
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> index cd46c88..7678287 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>          ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>                     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>> +                 gpu->drm);
>>>>>>        if (ret)
>>>>>>            return ret;
>>>>>>    diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>> *pipe, const char *name)
>>>>>>          return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>                      lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>> -                  name);
>>>>>> +                  name,
>>>>>> +                  pipe->ldev->ddev);
>>>>>>    }
>>>>>>      void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> index 30e7b71..37b03b01 100644
>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>> *pfdev)
>>>>>>            ret = drm_sched_init(&js->queue[j].sched,
>>>>>>                         &panfrost_sched_ops,
>>>>>>                         1, 0, msecs_to_jiffies(500),
>>>>>> -                     "pan_js");
>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>            if (ret) {
>>>>>>                dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>> ret);
>>>>>>                goto err_sched;
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index c3f0bd0..95db8c6 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -53,6 +53,7 @@
>>>>>>    #include <drm/drm_print.h>
>>>>>>    #include <drm/gpu_scheduler.h>
>>>>>>    #include <drm/spsc_queue.h>
>>>>>> +#include <drm/drm_drv.h>
>>>>>>      #define CREATE_TRACE_POINTS
>>>>>>    #include "gpu_scheduler_trace.h"
>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>> work_struct *work)
>>>>>>        struct drm_gpu_scheduler *sched;
>>>>>>        struct drm_sched_job *job;
>>>>>>    +    int idx;
>>>>>> +
>>>>>>        sched = container_of(work, struct drm_gpu_scheduler,
>>>>>> work_tdr.work);
>>>>>>    +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>> scheduler:%s",
>>>>>> +             __func__, sched->name);
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>        /* Protects against concurrent deletion in
>>>>>> drm_sched_get_cleanup_job */
>>>>>>        spin_lock(&sched->job_list_lock);
>>>>>>        job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>> work_struct *work)
>>>>>>        spin_lock(&sched->job_list_lock);
>>>>>>        drm_sched_start_timeout(sched);
>>>>>>        spin_unlock(&sched->job_list_lock);
>>>>>> +
>>>>>> +    drm_dev_exit(idx);
>>>>>>    }
>>>>>>       /**
>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>               unsigned hw_submission,
>>>>>>               unsigned hang_limit,
>>>>>>               long timeout,
>>>>>> -           const char *name)
>>>>>> +           const char *name,
>>>>>> +           struct drm_device *ddev)
>>>>>>    {
>>>>>>        int i, ret;
>>>>>>        sched->ops = ops;
>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>        sched->name = name;
>>>>>>        sched->timeout = timeout;
>>>>>>        sched->hang_limit = hang_limit;
>>>>>> +    sched->ddev = ddev;
>>>>>>        for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>> i++)
>>>>>>            drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>    diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> index 0747614..f5076e5 100644
>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                     &v3d_bin_sched_ops,
>>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>>> -                 "v3d_bin");
>>>>>> +                 "v3d_bin",
>>>>>> +                 &v3d->drm);
>>>>>>        if (ret) {
>>>>>>            dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>> %d.", ret);
>>>>>>            return ret;
>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                     &v3d_render_sched_ops,
>>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>>> -                 "v3d_render");
>>>>>> +                 "v3d_render",
>>>>>> +                 &v3d->drm);
>>>>>>        if (ret) {
>>>>>>            dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>> %d.",
>>>>>>                ret);
>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                     &v3d_tfu_sched_ops,
>>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>>> -                 "v3d_tfu");
>>>>>> +                 "v3d_tfu",
>>>>>> +                 &v3d->drm);
>>>>>>        if (ret) {
>>>>>>            dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>                ret);
>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                         &v3d_csd_sched_ops,
>>>>>>                         hw_jobs_limit, job_hang_limit,
>>>>>>                         msecs_to_jiffies(hang_limit_ms),
>>>>>> -                     "v3d_csd");
>>>>>> +                     "v3d_csd",
>>>>>> +                     &v3d->drm);
>>>>>>            if (ret) {
>>>>>>                dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>> %d.",
>>>>>>                    ret);
>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                         &v3d_cache_clean_sched_ops,
>>>>>>                         hw_jobs_limit, job_hang_limit,
>>>>>>                         msecs_to_jiffies(hang_limit_ms),
>>>>>> -                     "v3d_cache_clean");
>>>>>> +                     "v3d_cache_clean",
>>>>>> +                     &v3d->drm);
>>>>>>            if (ret) {
>>>>>>                dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>> scheduler: %d.",
>>>>>>                    ret);
>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>> index 9243655..a980709 100644
>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>> @@ -32,6 +32,7 @@
>>>>>>      struct drm_gpu_scheduler;
>>>>>>    struct drm_sched_rq;
>>>>>> +struct drm_device;
>>>>>>      /* These are often used as an (initial) index
>>>>>>     * to an array, and as such should start at 0.
>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>     * @score: score to help loadbalancer pick a idle sched
>>>>>>     * @ready: marks if the underlying HW is ready to work
>>>>>>     * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>     *
>>>>>>     * One scheduler is implemented for each hardware ring.
>>>>>>     */
>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>        atomic_t                        score;
>>>>>>        bool                ready;
>>>>>>        bool                free_guilty;
>>>>>> +    struct drm_device        *ddev;
>>>>>>    };
>>>>>>      int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>               const struct drm_sched_backend_ops *ops,
>>>>>>               uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>> -           const char *name);
>>>>>> +           const char *name,
>>>>>> +           struct drm_device *ddev);
>>>>>>      void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>    int drm_sched_job_init(struct drm_sched_job *job,
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548375418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wNLdozuhVS3smIpAuWB0tjFO3XDo1OmmZEgTCxviJaI%3D&amp;reserved=0
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OZGMVRwFXiuhoG3%2FTP54e6vk0xpMQujqAlNxtCcX7kA%3D&amp;reserved=0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-24 17:11               ` Luben Tuikov
  0 siblings, 0 replies; 212+ messages in thread
From: Luben Tuikov @ 2020-11-24 17:11 UTC (permalink / raw)
  To: christian.koenig, Andrey Grodzovsky, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

On 2020-11-24 2:50 a.m., Christian König wrote:
> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>> I think that this should go into the device specific recovery
>>>>> function and not in the scheduler.
>>>>
>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>> work to restart from here
>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>> cover other places like
>>>> job cleanup or starting new job but those should stop once the
>>>> scheduler thread is stopped later.
>>> Yeah, but this is rather unclean. We should probably return an error
>>> code instead if the timer should be rearmed or not.
>> Christian, this is exactly my work I told you about
>> last week on Wednesday in our weekly meeting. And
>> which I wrote to you in an email last year about this
>> time.
> 
> Yeah, that's why I'm suggesting it here as well.

It seems you're suggesting that Andrey do it, while
all too well you know I've been working on this
for some time now.

I wrote you about this last year same time
in an email. And I discussed it on the Wednesday
meeting.

You could've mentioned that here the first time.

> 
>> So what do we do now?
> 
> Split your patches into smaller parts and submit them chunk by chunk.
> 
> E.g. renames first and then functional changes grouped by area they change.

I have, but my final patch, a tiny one but which implements
the core reason for the change seems buggy, and I'm looking
for a way to debug it.

Regards,
Luben


> 
> Regards,
> Christian.
> 
>>
>> I can submit those changes without the last part,
>> which builds on this change.
>>
>> I'm still testing the last part and was hoping
>> to submit it all in one sequence of patches,
>> after my testing.
>>
>> Regards,
>> Luben
>>
>>> Christian.
>>>
>>>> Andrey
>>>>
>>>>
>>>>> Christian.
>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>    drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>    drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>    drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>    drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>    drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>    include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>    7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> index d56f402..d0b0021 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>> amdgpu_ring *ring,
>>>>>>              r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>                       num_hw_submission, amdgpu_job_hang_limit,
>>>>>> -                   timeout, ring->name);
>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>            if (r) {
>>>>>>                DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>                      ring->name);
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> index cd46c88..7678287 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>          ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>                     etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>> +                 gpu->drm);
>>>>>>        if (ret)
>>>>>>            return ret;
>>>>>>    diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>> *pipe, const char *name)
>>>>>>          return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>                      lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>> -                  name);
>>>>>> +                  name,
>>>>>> +                  pipe->ldev->ddev);
>>>>>>    }
>>>>>>      void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> index 30e7b71..37b03b01 100644
>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>> *pfdev)
>>>>>>            ret = drm_sched_init(&js->queue[j].sched,
>>>>>>                         &panfrost_sched_ops,
>>>>>>                         1, 0, msecs_to_jiffies(500),
>>>>>> -                     "pan_js");
>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>            if (ret) {
>>>>>>                dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>> ret);
>>>>>>                goto err_sched;
>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> index c3f0bd0..95db8c6 100644
>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>> @@ -53,6 +53,7 @@
>>>>>>    #include <drm/drm_print.h>
>>>>>>    #include <drm/gpu_scheduler.h>
>>>>>>    #include <drm/spsc_queue.h>
>>>>>> +#include <drm/drm_drv.h>
>>>>>>      #define CREATE_TRACE_POINTS
>>>>>>    #include "gpu_scheduler_trace.h"
>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>> work_struct *work)
>>>>>>        struct drm_gpu_scheduler *sched;
>>>>>>        struct drm_sched_job *job;
>>>>>>    +    int idx;
>>>>>> +
>>>>>>        sched = container_of(work, struct drm_gpu_scheduler,
>>>>>> work_tdr.work);
>>>>>>    +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>> scheduler:%s",
>>>>>> +             __func__, sched->name);
>>>>>> +        return;
>>>>>> +    }
>>>>>> +
>>>>>>        /* Protects against concurrent deletion in
>>>>>> drm_sched_get_cleanup_job */
>>>>>>        spin_lock(&sched->job_list_lock);
>>>>>>        job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>> work_struct *work)
>>>>>>        spin_lock(&sched->job_list_lock);
>>>>>>        drm_sched_start_timeout(sched);
>>>>>>        spin_unlock(&sched->job_list_lock);
>>>>>> +
>>>>>> +    drm_dev_exit(idx);
>>>>>>    }
>>>>>>       /**
>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>               unsigned hw_submission,
>>>>>>               unsigned hang_limit,
>>>>>>               long timeout,
>>>>>> -           const char *name)
>>>>>> +           const char *name,
>>>>>> +           struct drm_device *ddev)
>>>>>>    {
>>>>>>        int i, ret;
>>>>>>        sched->ops = ops;
>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>        sched->name = name;
>>>>>>        sched->timeout = timeout;
>>>>>>        sched->hang_limit = hang_limit;
>>>>>> +    sched->ddev = ddev;
>>>>>>        for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>> i++)
>>>>>>            drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>    diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> index 0747614..f5076e5 100644
>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                     &v3d_bin_sched_ops,
>>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>>> -                 "v3d_bin");
>>>>>> +                 "v3d_bin",
>>>>>> +                 &v3d->drm);
>>>>>>        if (ret) {
>>>>>>            dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>> %d.", ret);
>>>>>>            return ret;
>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                     &v3d_render_sched_ops,
>>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>>> -                 "v3d_render");
>>>>>> +                 "v3d_render",
>>>>>> +                 &v3d->drm);
>>>>>>        if (ret) {
>>>>>>            dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>> %d.",
>>>>>>                ret);
>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                     &v3d_tfu_sched_ops,
>>>>>>                     hw_jobs_limit, job_hang_limit,
>>>>>>                     msecs_to_jiffies(hang_limit_ms),
>>>>>> -                 "v3d_tfu");
>>>>>> +                 "v3d_tfu",
>>>>>> +                 &v3d->drm);
>>>>>>        if (ret) {
>>>>>>            dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>                ret);
>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                         &v3d_csd_sched_ops,
>>>>>>                         hw_jobs_limit, job_hang_limit,
>>>>>>                         msecs_to_jiffies(hang_limit_ms),
>>>>>> -                     "v3d_csd");
>>>>>> +                     "v3d_csd",
>>>>>> +                     &v3d->drm);
>>>>>>            if (ret) {
>>>>>>                dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>> %d.",
>>>>>>                    ret);
>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>                         &v3d_cache_clean_sched_ops,
>>>>>>                         hw_jobs_limit, job_hang_limit,
>>>>>>                         msecs_to_jiffies(hang_limit_ms),
>>>>>> -                     "v3d_cache_clean");
>>>>>> +                     "v3d_cache_clean",
>>>>>> +                     &v3d->drm);
>>>>>>            if (ret) {
>>>>>>                dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>> scheduler: %d.",
>>>>>>                    ret);
>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>> index 9243655..a980709 100644
>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>> @@ -32,6 +32,7 @@
>>>>>>      struct drm_gpu_scheduler;
>>>>>>    struct drm_sched_rq;
>>>>>> +struct drm_device;
>>>>>>      /* These are often used as an (initial) index
>>>>>>     * to an array, and as such should start at 0.
>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>     * @score: score to help loadbalancer pick a idle sched
>>>>>>     * @ready: marks if the underlying HW is ready to work
>>>>>>     * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>     *
>>>>>>     * One scheduler is implemented for each hardware ring.
>>>>>>     */
>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>        atomic_t                        score;
>>>>>>        bool                ready;
>>>>>>        bool                free_guilty;
>>>>>> +    struct drm_device        *ddev;
>>>>>>    };
>>>>>>      int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>               const struct drm_sched_backend_ops *ops,
>>>>>>               uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>> -           const char *name);
>>>>>> +           const char *name,
>>>>>> +           struct drm_device *ddev);
>>>>>>      void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>    int drm_sched_job_init(struct drm_sched_job *job,
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548375418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wNLdozuhVS3smIpAuWB0tjFO3XDo1OmmZEgTCxviJaI%3D&amp;reserved=0
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OZGMVRwFXiuhoG3%2FTP54e6vk0xpMQujqAlNxtCcX7kA%3D&amp;reserved=0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-24 17:11               ` Luben Tuikov
@ 2020-11-24 17:17                 ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-24 17:17 UTC (permalink / raw)
  To: Luben Tuikov, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/24/20 12:11 PM, Luben Tuikov wrote:
> On 2020-11-24 2:50 a.m., Christian König wrote:
>> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>>> I think that this should go into the device specific recovery
>>>>>> function and not in the scheduler.
>>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>>> work to restart from here
>>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>>> cover other places like
>>>>> job cleanup or starting new job but those should stop once the
>>>>> scheduler thread is stopped later.
>>>> Yeah, but this is rather unclean. We should probably return an error
>>>> code instead if the timer should be rearmed or not.
>>> Christian, this is exactly my work I told you about
>>> last week on Wednesday in our weekly meeting. And
>>> which I wrote to you in an email last year about this
>>> time.
>> Yeah, that's why I'm suggesting it here as well.
> It seems you're suggesting that Andrey do it, while
> all too well you know I've been working on this
> for some time now.
>
> I wrote you about this last year same time
> in an email. And I discussed it on the Wednesday
> meeting.
>
> You could've mentioned that here the first time.


Luben, I actually strongly prefer that you do it and share ur patch with me 
since I don't
want to do unneeded refactoring which will conflict with with ur work. Also, please
usedrm-misc for this since it's not amdgpu specific work and will be easier for me.

Andrey


>
>>> So what do we do now?
>> Split your patches into smaller parts and submit them chunk by chunk.
>>
>> E.g. renames first and then functional changes grouped by area they change.
> I have, but my final patch, a tiny one but which implements
> the core reason for the change seems buggy, and I'm looking
> for a way to debug it.
>
> Regards,
> Luben
>
>
>> Regards,
>> Christian.
>>
>>> I can submit those changes without the last part,
>>> which builds on this change.
>>>
>>> I'm still testing the last part and was hoping
>>> to submit it all in one sequence of patches,
>>> after my testing.
>>>
>>> Regards,
>>> Luben
>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>>     drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>>     drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>>     drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>>     include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>>     7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> index d56f402..d0b0021 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>>> amdgpu_ring *ring,
>>>>>>>               r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>                        num_hw_submission, amdgpu_job_hang_limit,
>>>>>>> -                   timeout, ring->name);
>>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>>             if (r) {
>>>>>>>                 DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>                       ring->name);
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> index cd46c88..7678287 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>>           ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>>                      etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>>> +                 gpu->drm);
>>>>>>>         if (ret)
>>>>>>>             return ret;
>>>>>>>     diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>>> *pipe, const char *name)
>>>>>>>           return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>>                       lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>>> -                  name);
>>>>>>> +                  name,
>>>>>>> +                  pipe->ldev->ddev);
>>>>>>>     }
>>>>>>>       void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> index 30e7b71..37b03b01 100644
>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>>> *pfdev)
>>>>>>>             ret = drm_sched_init(&js->queue[j].sched,
>>>>>>>                          &panfrost_sched_ops,
>>>>>>>                          1, 0, msecs_to_jiffies(500),
>>>>>>> -                     "pan_js");
>>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>>> ret);
>>>>>>>                 goto err_sched;
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> index c3f0bd0..95db8c6 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> @@ -53,6 +53,7 @@
>>>>>>>     #include <drm/drm_print.h>
>>>>>>>     #include <drm/gpu_scheduler.h>
>>>>>>>     #include <drm/spsc_queue.h>
>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>       #define CREATE_TRACE_POINTS
>>>>>>>     #include "gpu_scheduler_trace.h"
>>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>         struct drm_gpu_scheduler *sched;
>>>>>>>         struct drm_sched_job *job;
>>>>>>>     +    int idx;
>>>>>>> +
>>>>>>>         sched = container_of(work, struct drm_gpu_scheduler,
>>>>>>> work_tdr.work);
>>>>>>>     +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>>> scheduler:%s",
>>>>>>> +             __func__, sched->name);
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>         /* Protects against concurrent deletion in
>>>>>>> drm_sched_get_cleanup_job */
>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>         job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>         drm_sched_start_timeout(sched);
>>>>>>>         spin_unlock(&sched->job_list_lock);
>>>>>>> +
>>>>>>> +    drm_dev_exit(idx);
>>>>>>>     }
>>>>>>>        /**
>>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>                unsigned hw_submission,
>>>>>>>                unsigned hang_limit,
>>>>>>>                long timeout,
>>>>>>> -           const char *name)
>>>>>>> +           const char *name,
>>>>>>> +           struct drm_device *ddev)
>>>>>>>     {
>>>>>>>         int i, ret;
>>>>>>>         sched->ops = ops;
>>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>         sched->name = name;
>>>>>>>         sched->timeout = timeout;
>>>>>>>         sched->hang_limit = hang_limit;
>>>>>>> +    sched->ddev = ddev;
>>>>>>>         for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>>> i++)
>>>>>>>             drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>>     diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> index 0747614..f5076e5 100644
>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_bin_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_bin");
>>>>>>> +                 "v3d_bin",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>>> %d.", ret);
>>>>>>>             return ret;
>>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_render_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_render");
>>>>>>> +                 "v3d_render",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>>> %d.",
>>>>>>>                 ret);
>>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_tfu_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_tfu");
>>>>>>> +                 "v3d_tfu",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>>                 ret);
>>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                          &v3d_csd_sched_ops,
>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                     "v3d_csd");
>>>>>>> +                     "v3d_csd",
>>>>>>> +                     &v3d->drm);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>>> %d.",
>>>>>>>                     ret);
>>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                          &v3d_cache_clean_sched_ops,
>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                     "v3d_cache_clean");
>>>>>>> +                     "v3d_cache_clean",
>>>>>>> +                     &v3d->drm);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>>> scheduler: %d.",
>>>>>>>                     ret);
>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>> index 9243655..a980709 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -32,6 +32,7 @@
>>>>>>>       struct drm_gpu_scheduler;
>>>>>>>     struct drm_sched_rq;
>>>>>>> +struct drm_device;
>>>>>>>       /* These are often used as an (initial) index
>>>>>>>      * to an array, and as such should start at 0.
>>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>>      * @score: score to help loadbalancer pick a idle sched
>>>>>>>      * @ready: marks if the underlying HW is ready to work
>>>>>>>      * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>>      *
>>>>>>>      * One scheduler is implemented for each hardware ring.
>>>>>>>      */
>>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>>         atomic_t                        score;
>>>>>>>         bool                ready;
>>>>>>>         bool                free_guilty;
>>>>>>> +    struct drm_device        *ddev;
>>>>>>>     };
>>>>>>>       int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>                const struct drm_sched_backend_ops *ops,
>>>>>>>                uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>>> -           const char *name);
>>>>>>> +           const char *name,
>>>>>>> +           struct drm_device *ddev);
>>>>>>>       void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548375418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wNLdozuhVS3smIpAuWB0tjFO3XDo1OmmZEgTCxviJaI%3D&amp;reserved=0
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OZGMVRwFXiuhoG3%2FTP54e6vk0xpMQujqAlNxtCcX7kA%3D&amp;reserved=0
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-24 17:17                 ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-24 17:17 UTC (permalink / raw)
  To: Luben Tuikov, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/24/20 12:11 PM, Luben Tuikov wrote:
> On 2020-11-24 2:50 a.m., Christian König wrote:
>> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>>> I think that this should go into the device specific recovery
>>>>>> function and not in the scheduler.
>>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>>> work to restart from here
>>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>>> cover other places like
>>>>> job cleanup or starting new job but those should stop once the
>>>>> scheduler thread is stopped later.
>>>> Yeah, but this is rather unclean. We should probably return an error
>>>> code instead if the timer should be rearmed or not.
>>> Christian, this is exactly my work I told you about
>>> last week on Wednesday in our weekly meeting. And
>>> which I wrote to you in an email last year about this
>>> time.
>> Yeah, that's why I'm suggesting it here as well.
> It seems you're suggesting that Andrey do it, while
> all too well you know I've been working on this
> for some time now.
>
> I wrote you about this last year same time
> in an email. And I discussed it on the Wednesday
> meeting.
>
> You could've mentioned that here the first time.


Luben, I actually strongly prefer that you do it and share ur patch with me 
since I don't
want to do unneeded refactoring which will conflict with with ur work. Also, please
usedrm-misc for this since it's not amdgpu specific work and will be easier for me.

Andrey


>
>>> So what do we do now?
>> Split your patches into smaller parts and submit them chunk by chunk.
>>
>> E.g. renames first and then functional changes grouped by area they change.
> I have, but my final patch, a tiny one but which implements
> the core reason for the change seems buggy, and I'm looking
> for a way to debug it.
>
> Regards,
> Luben
>
>
>> Regards,
>> Christian.
>>
>>> I can submit those changes without the last part,
>>> which builds on this change.
>>>
>>> I'm still testing the last part and was hoping
>>> to submit it all in one sequence of patches,
>>> after my testing.
>>>
>>> Regards,
>>> Luben
>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>>     drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>>     drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>>     drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>>     include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>>     7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> index d56f402..d0b0021 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>>> amdgpu_ring *ring,
>>>>>>>               r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>                        num_hw_submission, amdgpu_job_hang_limit,
>>>>>>> -                   timeout, ring->name);
>>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>>             if (r) {
>>>>>>>                 DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>                       ring->name);
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> index cd46c88..7678287 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>>           ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>>                      etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>>> +                 gpu->drm);
>>>>>>>         if (ret)
>>>>>>>             return ret;
>>>>>>>     diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>>> *pipe, const char *name)
>>>>>>>           return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>>                       lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>>> -                  name);
>>>>>>> +                  name,
>>>>>>> +                  pipe->ldev->ddev);
>>>>>>>     }
>>>>>>>       void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> index 30e7b71..37b03b01 100644
>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>>> *pfdev)
>>>>>>>             ret = drm_sched_init(&js->queue[j].sched,
>>>>>>>                          &panfrost_sched_ops,
>>>>>>>                          1, 0, msecs_to_jiffies(500),
>>>>>>> -                     "pan_js");
>>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>>> ret);
>>>>>>>                 goto err_sched;
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> index c3f0bd0..95db8c6 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> @@ -53,6 +53,7 @@
>>>>>>>     #include <drm/drm_print.h>
>>>>>>>     #include <drm/gpu_scheduler.h>
>>>>>>>     #include <drm/spsc_queue.h>
>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>       #define CREATE_TRACE_POINTS
>>>>>>>     #include "gpu_scheduler_trace.h"
>>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>         struct drm_gpu_scheduler *sched;
>>>>>>>         struct drm_sched_job *job;
>>>>>>>     +    int idx;
>>>>>>> +
>>>>>>>         sched = container_of(work, struct drm_gpu_scheduler,
>>>>>>> work_tdr.work);
>>>>>>>     +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>>> scheduler:%s",
>>>>>>> +             __func__, sched->name);
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>         /* Protects against concurrent deletion in
>>>>>>> drm_sched_get_cleanup_job */
>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>         job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>         drm_sched_start_timeout(sched);
>>>>>>>         spin_unlock(&sched->job_list_lock);
>>>>>>> +
>>>>>>> +    drm_dev_exit(idx);
>>>>>>>     }
>>>>>>>        /**
>>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>                unsigned hw_submission,
>>>>>>>                unsigned hang_limit,
>>>>>>>                long timeout,
>>>>>>> -           const char *name)
>>>>>>> +           const char *name,
>>>>>>> +           struct drm_device *ddev)
>>>>>>>     {
>>>>>>>         int i, ret;
>>>>>>>         sched->ops = ops;
>>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>         sched->name = name;
>>>>>>>         sched->timeout = timeout;
>>>>>>>         sched->hang_limit = hang_limit;
>>>>>>> +    sched->ddev = ddev;
>>>>>>>         for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>>> i++)
>>>>>>>             drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>>     diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> index 0747614..f5076e5 100644
>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_bin_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_bin");
>>>>>>> +                 "v3d_bin",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>>> %d.", ret);
>>>>>>>             return ret;
>>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_render_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_render");
>>>>>>> +                 "v3d_render",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>>> %d.",
>>>>>>>                 ret);
>>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_tfu_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_tfu");
>>>>>>> +                 "v3d_tfu",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>>                 ret);
>>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                          &v3d_csd_sched_ops,
>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                     "v3d_csd");
>>>>>>> +                     "v3d_csd",
>>>>>>> +                     &v3d->drm);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>>> %d.",
>>>>>>>                     ret);
>>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                          &v3d_cache_clean_sched_ops,
>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                     "v3d_cache_clean");
>>>>>>> +                     "v3d_cache_clean",
>>>>>>> +                     &v3d->drm);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>>> scheduler: %d.",
>>>>>>>                     ret);
>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>> index 9243655..a980709 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -32,6 +32,7 @@
>>>>>>>       struct drm_gpu_scheduler;
>>>>>>>     struct drm_sched_rq;
>>>>>>> +struct drm_device;
>>>>>>>       /* These are often used as an (initial) index
>>>>>>>      * to an array, and as such should start at 0.
>>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>>      * @score: score to help loadbalancer pick a idle sched
>>>>>>>      * @ready: marks if the underlying HW is ready to work
>>>>>>>      * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>>      *
>>>>>>>      * One scheduler is implemented for each hardware ring.
>>>>>>>      */
>>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>>         atomic_t                        score;
>>>>>>>         bool                ready;
>>>>>>>         bool                free_guilty;
>>>>>>> +    struct drm_device        *ddev;
>>>>>>>     };
>>>>>>>       int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>                const struct drm_sched_backend_ops *ops,
>>>>>>>                uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>>> -           const char *name);
>>>>>>> +           const char *name,
>>>>>>> +           struct drm_device *ddev);
>>>>>>>       void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548375418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wNLdozuhVS3smIpAuWB0tjFO3XDo1OmmZEgTCxviJaI%3D&amp;reserved=0
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OZGMVRwFXiuhoG3%2FTP54e6vk0xpMQujqAlNxtCcX7kA%3D&amp;reserved=0
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-24 17:11               ` Luben Tuikov
@ 2020-11-24 17:40                 ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-24 17:40 UTC (permalink / raw)
  To: Luben Tuikov, christian.koenig, Andrey Grodzovsky, amd-gfx,
	dri-devel, daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 24.11.20 um 18:11 schrieb Luben Tuikov:
> On 2020-11-24 2:50 a.m., Christian König wrote:
>> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>>> I think that this should go into the device specific recovery
>>>>>> function and not in the scheduler.
>>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>>> work to restart from here
>>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>>> cover other places like
>>>>> job cleanup or starting new job but those should stop once the
>>>>> scheduler thread is stopped later.
>>>> Yeah, but this is rather unclean. We should probably return an error
>>>> code instead if the timer should be rearmed or not.
>>> Christian, this is exactly my work I told you about
>>> last week on Wednesday in our weekly meeting. And
>>> which I wrote to you in an email last year about this
>>> time.
>> Yeah, that's why I'm suggesting it here as well.
> It seems you're suggesting that Andrey do it, while
> all too well you know I've been working on this
> for some time now.

Changing the return value is just a minimal change and I didn't want to 
block Andrey in any way.

>
> I wrote you about this last year same time
> in an email. And I discussed it on the Wednesday
> meeting.
>
> You could've mentioned that here the first time.
>
>>> So what do we do now?
>> Split your patches into smaller parts and submit them chunk by chunk.
>>
>> E.g. renames first and then functional changes grouped by area they change.
> I have, but my final patch, a tiny one but which implements
> the core reason for the change seems buggy, and I'm looking
> for a way to debug it.

Just send it out in chunks, e.g. non functional changes like renames 
shouldn't cause any problems and having them in the branch early 
minimizes conflicts with work from others.

Regards,
Christian.

>
> Regards,
> Luben
>
>
>> Regards,
>> Christian.
>>
>>> I can submit those changes without the last part,
>>> which builds on this change.
>>>
>>> I'm still testing the last part and was hoping
>>> to submit it all in one sequence of patches,
>>> after my testing.
>>>
>>> Regards,
>>> Luben
>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>>     drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>>     drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>>     drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>>     include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>>     7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> index d56f402..d0b0021 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>>> amdgpu_ring *ring,
>>>>>>>               r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>                        num_hw_submission, amdgpu_job_hang_limit,
>>>>>>> -                   timeout, ring->name);
>>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>>             if (r) {
>>>>>>>                 DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>                       ring->name);
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> index cd46c88..7678287 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>>           ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>>                      etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>>> +                 gpu->drm);
>>>>>>>         if (ret)
>>>>>>>             return ret;
>>>>>>>     diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>>> *pipe, const char *name)
>>>>>>>           return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>>                       lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>>> -                  name);
>>>>>>> +                  name,
>>>>>>> +                  pipe->ldev->ddev);
>>>>>>>     }
>>>>>>>       void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> index 30e7b71..37b03b01 100644
>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>>> *pfdev)
>>>>>>>             ret = drm_sched_init(&js->queue[j].sched,
>>>>>>>                          &panfrost_sched_ops,
>>>>>>>                          1, 0, msecs_to_jiffies(500),
>>>>>>> -                     "pan_js");
>>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>>> ret);
>>>>>>>                 goto err_sched;
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> index c3f0bd0..95db8c6 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> @@ -53,6 +53,7 @@
>>>>>>>     #include <drm/drm_print.h>
>>>>>>>     #include <drm/gpu_scheduler.h>
>>>>>>>     #include <drm/spsc_queue.h>
>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>       #define CREATE_TRACE_POINTS
>>>>>>>     #include "gpu_scheduler_trace.h"
>>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>         struct drm_gpu_scheduler *sched;
>>>>>>>         struct drm_sched_job *job;
>>>>>>>     +    int idx;
>>>>>>> +
>>>>>>>         sched = container_of(work, struct drm_gpu_scheduler,
>>>>>>> work_tdr.work);
>>>>>>>     +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>>> scheduler:%s",
>>>>>>> +             __func__, sched->name);
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>         /* Protects against concurrent deletion in
>>>>>>> drm_sched_get_cleanup_job */
>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>         job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>         drm_sched_start_timeout(sched);
>>>>>>>         spin_unlock(&sched->job_list_lock);
>>>>>>> +
>>>>>>> +    drm_dev_exit(idx);
>>>>>>>     }
>>>>>>>        /**
>>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>                unsigned hw_submission,
>>>>>>>                unsigned hang_limit,
>>>>>>>                long timeout,
>>>>>>> -           const char *name)
>>>>>>> +           const char *name,
>>>>>>> +           struct drm_device *ddev)
>>>>>>>     {
>>>>>>>         int i, ret;
>>>>>>>         sched->ops = ops;
>>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>         sched->name = name;
>>>>>>>         sched->timeout = timeout;
>>>>>>>         sched->hang_limit = hang_limit;
>>>>>>> +    sched->ddev = ddev;
>>>>>>>         for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>>> i++)
>>>>>>>             drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>>     diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> index 0747614..f5076e5 100644
>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_bin_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_bin");
>>>>>>> +                 "v3d_bin",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>>> %d.", ret);
>>>>>>>             return ret;
>>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_render_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_render");
>>>>>>> +                 "v3d_render",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>>> %d.",
>>>>>>>                 ret);
>>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_tfu_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_tfu");
>>>>>>> +                 "v3d_tfu",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>>                 ret);
>>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                          &v3d_csd_sched_ops,
>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                     "v3d_csd");
>>>>>>> +                     "v3d_csd",
>>>>>>> +                     &v3d->drm);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>>> %d.",
>>>>>>>                     ret);
>>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                          &v3d_cache_clean_sched_ops,
>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                     "v3d_cache_clean");
>>>>>>> +                     "v3d_cache_clean",
>>>>>>> +                     &v3d->drm);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>>> scheduler: %d.",
>>>>>>>                     ret);
>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>> index 9243655..a980709 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -32,6 +32,7 @@
>>>>>>>       struct drm_gpu_scheduler;
>>>>>>>     struct drm_sched_rq;
>>>>>>> +struct drm_device;
>>>>>>>       /* These are often used as an (initial) index
>>>>>>>      * to an array, and as such should start at 0.
>>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>>      * @score: score to help loadbalancer pick a idle sched
>>>>>>>      * @ready: marks if the underlying HW is ready to work
>>>>>>>      * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>>      *
>>>>>>>      * One scheduler is implemented for each hardware ring.
>>>>>>>      */
>>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>>         atomic_t                        score;
>>>>>>>         bool                ready;
>>>>>>>         bool                free_guilty;
>>>>>>> +    struct drm_device        *ddev;
>>>>>>>     };
>>>>>>>       int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>                const struct drm_sched_backend_ops *ops,
>>>>>>>                uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>>> -           const char *name);
>>>>>>> +           const char *name,
>>>>>>> +           struct drm_device *ddev);
>>>>>>>       void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548375418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wNLdozuhVS3smIpAuWB0tjFO3XDo1OmmZEgTCxviJaI%3D&amp;reserved=0
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OZGMVRwFXiuhoG3%2FTP54e6vk0xpMQujqAlNxtCcX7kA%3D&amp;reserved=0
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-24 17:40                 ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-24 17:40 UTC (permalink / raw)
  To: Luben Tuikov, christian.koenig, Andrey Grodzovsky, amd-gfx,
	dri-devel, daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 24.11.20 um 18:11 schrieb Luben Tuikov:
> On 2020-11-24 2:50 a.m., Christian König wrote:
>> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>>> I think that this should go into the device specific recovery
>>>>>> function and not in the scheduler.
>>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>>> work to restart from here
>>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>>> cover other places like
>>>>> job cleanup or starting new job but those should stop once the
>>>>> scheduler thread is stopped later.
>>>> Yeah, but this is rather unclean. We should probably return an error
>>>> code instead if the timer should be rearmed or not.
>>> Christian, this is exactly my work I told you about
>>> last week on Wednesday in our weekly meeting. And
>>> which I wrote to you in an email last year about this
>>> time.
>> Yeah, that's why I'm suggesting it here as well.
> It seems you're suggesting that Andrey do it, while
> all too well you know I've been working on this
> for some time now.

Changing the return value is just a minimal change and I didn't want to 
block Andrey in any way.

>
> I wrote you about this last year same time
> in an email. And I discussed it on the Wednesday
> meeting.
>
> You could've mentioned that here the first time.
>
>>> So what do we do now?
>> Split your patches into smaller parts and submit them chunk by chunk.
>>
>> E.g. renames first and then functional changes grouped by area they change.
> I have, but my final patch, a tiny one but which implements
> the core reason for the change seems buggy, and I'm looking
> for a way to debug it.

Just send it out in chunks, e.g. non functional changes like renames 
shouldn't cause any problems and having them in the branch early 
minimizes conflicts with work from others.

Regards,
Christian.

>
> Regards,
> Luben
>
>
>> Regards,
>> Christian.
>>
>>> I can submit those changes without the last part,
>>> which builds on this change.
>>>
>>> I'm still testing the last part and was hoping
>>> to submit it all in one sequence of patches,
>>> after my testing.
>>>
>>> Regards,
>>> Luben
>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>>     drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>>     drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>>     drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>>     include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>>     7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> index d56f402..d0b0021 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>>> amdgpu_ring *ring,
>>>>>>>               r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>                        num_hw_submission, amdgpu_job_hang_limit,
>>>>>>> -                   timeout, ring->name);
>>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>>             if (r) {
>>>>>>>                 DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>                       ring->name);
>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> index cd46c88..7678287 100644
>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>>           ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>>                      etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>>> +                 gpu->drm);
>>>>>>>         if (ret)
>>>>>>>             return ret;
>>>>>>>     diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>>> *pipe, const char *name)
>>>>>>>           return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>>                       lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>>> -                  name);
>>>>>>> +                  name,
>>>>>>> +                  pipe->ldev->ddev);
>>>>>>>     }
>>>>>>>       void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> index 30e7b71..37b03b01 100644
>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>>> *pfdev)
>>>>>>>             ret = drm_sched_init(&js->queue[j].sched,
>>>>>>>                          &panfrost_sched_ops,
>>>>>>>                          1, 0, msecs_to_jiffies(500),
>>>>>>> -                     "pan_js");
>>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>>> ret);
>>>>>>>                 goto err_sched;
>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> index c3f0bd0..95db8c6 100644
>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>> @@ -53,6 +53,7 @@
>>>>>>>     #include <drm/drm_print.h>
>>>>>>>     #include <drm/gpu_scheduler.h>
>>>>>>>     #include <drm/spsc_queue.h>
>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>       #define CREATE_TRACE_POINTS
>>>>>>>     #include "gpu_scheduler_trace.h"
>>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>         struct drm_gpu_scheduler *sched;
>>>>>>>         struct drm_sched_job *job;
>>>>>>>     +    int idx;
>>>>>>> +
>>>>>>>         sched = container_of(work, struct drm_gpu_scheduler,
>>>>>>> work_tdr.work);
>>>>>>>     +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>>> scheduler:%s",
>>>>>>> +             __func__, sched->name);
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>         /* Protects against concurrent deletion in
>>>>>>> drm_sched_get_cleanup_job */
>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>         job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>>> work_struct *work)
>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>         drm_sched_start_timeout(sched);
>>>>>>>         spin_unlock(&sched->job_list_lock);
>>>>>>> +
>>>>>>> +    drm_dev_exit(idx);
>>>>>>>     }
>>>>>>>        /**
>>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>                unsigned hw_submission,
>>>>>>>                unsigned hang_limit,
>>>>>>>                long timeout,
>>>>>>> -           const char *name)
>>>>>>> +           const char *name,
>>>>>>> +           struct drm_device *ddev)
>>>>>>>     {
>>>>>>>         int i, ret;
>>>>>>>         sched->ops = ops;
>>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>         sched->name = name;
>>>>>>>         sched->timeout = timeout;
>>>>>>>         sched->hang_limit = hang_limit;
>>>>>>> +    sched->ddev = ddev;
>>>>>>>         for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>>> i++)
>>>>>>>             drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>>     diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> index 0747614..f5076e5 100644
>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_bin_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_bin");
>>>>>>> +                 "v3d_bin",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>>> %d.", ret);
>>>>>>>             return ret;
>>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_render_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_render");
>>>>>>> +                 "v3d_render",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>>> %d.",
>>>>>>>                 ret);
>>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                      &v3d_tfu_sched_ops,
>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                 "v3d_tfu");
>>>>>>> +                 "v3d_tfu",
>>>>>>> +                 &v3d->drm);
>>>>>>>         if (ret) {
>>>>>>>             dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>>                 ret);
>>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                          &v3d_csd_sched_ops,
>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                     "v3d_csd");
>>>>>>> +                     "v3d_csd",
>>>>>>> +                     &v3d->drm);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>>> %d.",
>>>>>>>                     ret);
>>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>                          &v3d_cache_clean_sched_ops,
>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>> -                     "v3d_cache_clean");
>>>>>>> +                     "v3d_cache_clean",
>>>>>>> +                     &v3d->drm);
>>>>>>>             if (ret) {
>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>>> scheduler: %d.",
>>>>>>>                     ret);
>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>> index 9243655..a980709 100644
>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>> @@ -32,6 +32,7 @@
>>>>>>>       struct drm_gpu_scheduler;
>>>>>>>     struct drm_sched_rq;
>>>>>>> +struct drm_device;
>>>>>>>       /* These are often used as an (initial) index
>>>>>>>      * to an array, and as such should start at 0.
>>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>>      * @score: score to help loadbalancer pick a idle sched
>>>>>>>      * @ready: marks if the underlying HW is ready to work
>>>>>>>      * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>>      *
>>>>>>>      * One scheduler is implemented for each hardware ring.
>>>>>>>      */
>>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>>         atomic_t                        score;
>>>>>>>         bool                ready;
>>>>>>>         bool                free_guilty;
>>>>>>> +    struct drm_device        *ddev;
>>>>>>>     };
>>>>>>>       int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>                const struct drm_sched_backend_ops *ops,
>>>>>>>                uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>>> -           const char *name);
>>>>>>> +           const char *name,
>>>>>>> +           struct drm_device *ddev);
>>>>>>>       void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548375418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wNLdozuhVS3smIpAuWB0tjFO3XDo1OmmZEgTCxviJaI%3D&amp;reserved=0
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OZGMVRwFXiuhoG3%2FTP54e6vk0xpMQujqAlNxtCcX7kA%3D&amp;reserved=0
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-24 17:17                 ` Andrey Grodzovsky
@ 2020-11-24 17:41                   ` Luben Tuikov
  -1 siblings, 0 replies; 212+ messages in thread
From: Luben Tuikov @ 2020-11-24 17:41 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

On 2020-11-24 12:17 p.m., Andrey Grodzovsky wrote:
> 
> On 11/24/20 12:11 PM, Luben Tuikov wrote:
>> On 2020-11-24 2:50 a.m., Christian König wrote:
>>> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>>>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>>>> I think that this should go into the device specific recovery
>>>>>>> function and not in the scheduler.
>>>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>>>> work to restart from here
>>>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>>>> cover other places like
>>>>>> job cleanup or starting new job but those should stop once the
>>>>>> scheduler thread is stopped later.
>>>>> Yeah, but this is rather unclean. We should probably return an error
>>>>> code instead if the timer should be rearmed or not.
>>>> Christian, this is exactly my work I told you about
>>>> last week on Wednesday in our weekly meeting. And
>>>> which I wrote to you in an email last year about this
>>>> time.
>>> Yeah, that's why I'm suggesting it here as well.
>> It seems you're suggesting that Andrey do it, while
>> all too well you know I've been working on this
>> for some time now.
>>
>> I wrote you about this last year same time
>> in an email. And I discussed it on the Wednesday
>> meeting.
>>
>> You could've mentioned that here the first time.
> 
> 
> Luben, I actually strongly prefer that you do it and share ur patch with me 
> since I don't
> want to do unneeded refactoring which will conflict with with ur work. Also, please
> usedrm-misc for this since it's not amdgpu specific work and will be easier for me.
> 
> Andrey

No problem, Andrey--will do.

Regards,
Luben

> 
> 
>>
>>>> So what do we do now?
>>> Split your patches into smaller parts and submit them chunk by chunk.
>>>
>>> E.g. renames first and then functional changes grouped by area they change.
>> I have, but my final patch, a tiny one but which implements
>> the core reason for the change seems buggy, and I'm looking
>> for a way to debug it.
>>
>> Regards,
>> Luben
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> I can submit those changes without the last part,
>>>> which builds on this change.
>>>>
>>>> I'm still testing the last part and was hoping
>>>> to submit it all in one sequence of patches,
>>>> after my testing.
>>>>
>>>> Regards,
>>>> Luben
>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>>>     drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>>>     drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>>>     drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>>>     include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>>>     7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> index d56f402..d0b0021 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>>>> amdgpu_ring *ring,
>>>>>>>>               r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>>                        num_hw_submission, amdgpu_job_hang_limit,
>>>>>>>> -                   timeout, ring->name);
>>>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>>>             if (r) {
>>>>>>>>                 DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>>                       ring->name);
>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> index cd46c88..7678287 100644
>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>>>           ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>>>                      etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>>>> +                 gpu->drm);
>>>>>>>>         if (ret)
>>>>>>>>             return ret;
>>>>>>>>     diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>>>> *pipe, const char *name)
>>>>>>>>           return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>>>                       lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>>>> -                  name);
>>>>>>>> +                  name,
>>>>>>>> +                  pipe->ldev->ddev);
>>>>>>>>     }
>>>>>>>>       void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> index 30e7b71..37b03b01 100644
>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>>>> *pfdev)
>>>>>>>>             ret = drm_sched_init(&js->queue[j].sched,
>>>>>>>>                          &panfrost_sched_ops,
>>>>>>>>                          1, 0, msecs_to_jiffies(500),
>>>>>>>> -                     "pan_js");
>>>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>>>> ret);
>>>>>>>>                 goto err_sched;
>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> index c3f0bd0..95db8c6 100644
>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> @@ -53,6 +53,7 @@
>>>>>>>>     #include <drm/drm_print.h>
>>>>>>>>     #include <drm/gpu_scheduler.h>
>>>>>>>>     #include <drm/spsc_queue.h>
>>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>>       #define CREATE_TRACE_POINTS
>>>>>>>>     #include "gpu_scheduler_trace.h"
>>>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>>>> work_struct *work)
>>>>>>>>         struct drm_gpu_scheduler *sched;
>>>>>>>>         struct drm_sched_job *job;
>>>>>>>>     +    int idx;
>>>>>>>> +
>>>>>>>>         sched = container_of(work, struct drm_gpu_scheduler,
>>>>>>>> work_tdr.work);
>>>>>>>>     +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>>>> scheduler:%s",
>>>>>>>> +             __func__, sched->name);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         /* Protects against concurrent deletion in
>>>>>>>> drm_sched_get_cleanup_job */
>>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>>         job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>>>> work_struct *work)
>>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>>         drm_sched_start_timeout(sched);
>>>>>>>>         spin_unlock(&sched->job_list_lock);
>>>>>>>> +
>>>>>>>> +    drm_dev_exit(idx);
>>>>>>>>     }
>>>>>>>>        /**
>>>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>                unsigned hw_submission,
>>>>>>>>                unsigned hang_limit,
>>>>>>>>                long timeout,
>>>>>>>> -           const char *name)
>>>>>>>> +           const char *name,
>>>>>>>> +           struct drm_device *ddev)
>>>>>>>>     {
>>>>>>>>         int i, ret;
>>>>>>>>         sched->ops = ops;
>>>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>         sched->name = name;
>>>>>>>>         sched->timeout = timeout;
>>>>>>>>         sched->hang_limit = hang_limit;
>>>>>>>> +    sched->ddev = ddev;
>>>>>>>>         for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>>>> i++)
>>>>>>>>             drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>>>     diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> index 0747614..f5076e5 100644
>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_bin_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_bin");
>>>>>>>> +                 "v3d_bin",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>>>> %d.", ret);
>>>>>>>>             return ret;
>>>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_render_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_render");
>>>>>>>> +                 "v3d_render",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>>>> %d.",
>>>>>>>>                 ret);
>>>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_tfu_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_tfu");
>>>>>>>> +                 "v3d_tfu",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>>>                 ret);
>>>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                          &v3d_csd_sched_ops,
>>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                     "v3d_csd");
>>>>>>>> +                     "v3d_csd",
>>>>>>>> +                     &v3d->drm);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>>>> %d.",
>>>>>>>>                     ret);
>>>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                          &v3d_cache_clean_sched_ops,
>>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                     "v3d_cache_clean");
>>>>>>>> +                     "v3d_cache_clean",
>>>>>>>> +                     &v3d->drm);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>>>> scheduler: %d.",
>>>>>>>>                     ret);
>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>> index 9243655..a980709 100644
>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>> @@ -32,6 +32,7 @@
>>>>>>>>       struct drm_gpu_scheduler;
>>>>>>>>     struct drm_sched_rq;
>>>>>>>> +struct drm_device;
>>>>>>>>       /* These are often used as an (initial) index
>>>>>>>>      * to an array, and as such should start at 0.
>>>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>>>      * @score: score to help loadbalancer pick a idle sched
>>>>>>>>      * @ready: marks if the underlying HW is ready to work
>>>>>>>>      * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>>>      *
>>>>>>>>      * One scheduler is implemented for each hardware ring.
>>>>>>>>      */
>>>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>>>         atomic_t                        score;
>>>>>>>>         bool                ready;
>>>>>>>>         bool                free_guilty;
>>>>>>>> +    struct drm_device        *ddev;
>>>>>>>>     };
>>>>>>>>       int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>                const struct drm_sched_backend_ops *ops,
>>>>>>>>                uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>>>> -           const char *name);
>>>>>>>> +           const char *name,
>>>>>>>> +           struct drm_device *ddev);
>>>>>>>>       void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548375418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wNLdozuhVS3smIpAuWB0tjFO3XDo1OmmZEgTCxviJaI%3D&amp;reserved=0
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OZGMVRwFXiuhoG3%2FTP54e6vk0xpMQujqAlNxtCcX7kA%3D&amp;reserved=0
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-24 17:41                   ` Luben Tuikov
  0 siblings, 0 replies; 212+ messages in thread
From: Luben Tuikov @ 2020-11-24 17:41 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

On 2020-11-24 12:17 p.m., Andrey Grodzovsky wrote:
> 
> On 11/24/20 12:11 PM, Luben Tuikov wrote:
>> On 2020-11-24 2:50 a.m., Christian König wrote:
>>> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>>>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>>>> I think that this should go into the device specific recovery
>>>>>>> function and not in the scheduler.
>>>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>>>> work to restart from here
>>>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>>>> cover other places like
>>>>>> job cleanup or starting new job but those should stop once the
>>>>>> scheduler thread is stopped later.
>>>>> Yeah, but this is rather unclean. We should probably return an error
>>>>> code instead if the timer should be rearmed or not.
>>>> Christian, this is exactly my work I told you about
>>>> last week on Wednesday in our weekly meeting. And
>>>> which I wrote to you in an email last year about this
>>>> time.
>>> Yeah, that's why I'm suggesting it here as well.
>> It seems you're suggesting that Andrey do it, while
>> all too well you know I've been working on this
>> for some time now.
>>
>> I wrote you about this last year same time
>> in an email. And I discussed it on the Wednesday
>> meeting.
>>
>> You could've mentioned that here the first time.
> 
> 
> Luben, I actually strongly prefer that you do it and share ur patch with me 
> since I don't
> want to do unneeded refactoring which will conflict with with ur work. Also, please
> usedrm-misc for this since it's not amdgpu specific work and will be easier for me.
> 
> Andrey

No problem, Andrey--will do.

Regards,
Luben

> 
> 
>>
>>>> So what do we do now?
>>> Split your patches into smaller parts and submit them chunk by chunk.
>>>
>>> E.g. renames first and then functional changes grouped by area they change.
>> I have, but my final patch, a tiny one but which implements
>> the core reason for the change seems buggy, and I'm looking
>> for a way to debug it.
>>
>> Regards,
>> Luben
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> I can submit those changes without the last part,
>>>> which builds on this change.
>>>>
>>>> I'm still testing the last part and was hoping
>>>> to submit it all in one sequence of patches,
>>>> after my testing.
>>>>
>>>> Regards,
>>>> Luben
>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>>>     drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>>>     drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>>>     drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>>>     include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>>>     7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> index d56f402..d0b0021 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>>>> amdgpu_ring *ring,
>>>>>>>>               r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>>                        num_hw_submission, amdgpu_job_hang_limit,
>>>>>>>> -                   timeout, ring->name);
>>>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>>>             if (r) {
>>>>>>>>                 DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>>                       ring->name);
>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> index cd46c88..7678287 100644
>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>>>           ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>>>                      etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>>>> +                 gpu->drm);
>>>>>>>>         if (ret)
>>>>>>>>             return ret;
>>>>>>>>     diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>>>> *pipe, const char *name)
>>>>>>>>           return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>>>                       lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>>>> -                  name);
>>>>>>>> +                  name,
>>>>>>>> +                  pipe->ldev->ddev);
>>>>>>>>     }
>>>>>>>>       void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> index 30e7b71..37b03b01 100644
>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>>>> *pfdev)
>>>>>>>>             ret = drm_sched_init(&js->queue[j].sched,
>>>>>>>>                          &panfrost_sched_ops,
>>>>>>>>                          1, 0, msecs_to_jiffies(500),
>>>>>>>> -                     "pan_js");
>>>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>>>> ret);
>>>>>>>>                 goto err_sched;
>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> index c3f0bd0..95db8c6 100644
>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> @@ -53,6 +53,7 @@
>>>>>>>>     #include <drm/drm_print.h>
>>>>>>>>     #include <drm/gpu_scheduler.h>
>>>>>>>>     #include <drm/spsc_queue.h>
>>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>>       #define CREATE_TRACE_POINTS
>>>>>>>>     #include "gpu_scheduler_trace.h"
>>>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>>>> work_struct *work)
>>>>>>>>         struct drm_gpu_scheduler *sched;
>>>>>>>>         struct drm_sched_job *job;
>>>>>>>>     +    int idx;
>>>>>>>> +
>>>>>>>>         sched = container_of(work, struct drm_gpu_scheduler,
>>>>>>>> work_tdr.work);
>>>>>>>>     +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>>>> scheduler:%s",
>>>>>>>> +             __func__, sched->name);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         /* Protects against concurrent deletion in
>>>>>>>> drm_sched_get_cleanup_job */
>>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>>         job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>>>> work_struct *work)
>>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>>         drm_sched_start_timeout(sched);
>>>>>>>>         spin_unlock(&sched->job_list_lock);
>>>>>>>> +
>>>>>>>> +    drm_dev_exit(idx);
>>>>>>>>     }
>>>>>>>>        /**
>>>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>                unsigned hw_submission,
>>>>>>>>                unsigned hang_limit,
>>>>>>>>                long timeout,
>>>>>>>> -           const char *name)
>>>>>>>> +           const char *name,
>>>>>>>> +           struct drm_device *ddev)
>>>>>>>>     {
>>>>>>>>         int i, ret;
>>>>>>>>         sched->ops = ops;
>>>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>         sched->name = name;
>>>>>>>>         sched->timeout = timeout;
>>>>>>>>         sched->hang_limit = hang_limit;
>>>>>>>> +    sched->ddev = ddev;
>>>>>>>>         for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>>>> i++)
>>>>>>>>             drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>>>     diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> index 0747614..f5076e5 100644
>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_bin_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_bin");
>>>>>>>> +                 "v3d_bin",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>>>> %d.", ret);
>>>>>>>>             return ret;
>>>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_render_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_render");
>>>>>>>> +                 "v3d_render",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>>>> %d.",
>>>>>>>>                 ret);
>>>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_tfu_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_tfu");
>>>>>>>> +                 "v3d_tfu",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>>>                 ret);
>>>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                          &v3d_csd_sched_ops,
>>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                     "v3d_csd");
>>>>>>>> +                     "v3d_csd",
>>>>>>>> +                     &v3d->drm);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>>>> %d.",
>>>>>>>>                     ret);
>>>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                          &v3d_cache_clean_sched_ops,
>>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                     "v3d_cache_clean");
>>>>>>>> +                     "v3d_cache_clean",
>>>>>>>> +                     &v3d->drm);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>>>> scheduler: %d.",
>>>>>>>>                     ret);
>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>> index 9243655..a980709 100644
>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>> @@ -32,6 +32,7 @@
>>>>>>>>       struct drm_gpu_scheduler;
>>>>>>>>     struct drm_sched_rq;
>>>>>>>> +struct drm_device;
>>>>>>>>       /* These are often used as an (initial) index
>>>>>>>>      * to an array, and as such should start at 0.
>>>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>>>      * @score: score to help loadbalancer pick a idle sched
>>>>>>>>      * @ready: marks if the underlying HW is ready to work
>>>>>>>>      * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>>>      *
>>>>>>>>      * One scheduler is implemented for each hardware ring.
>>>>>>>>      */
>>>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>>>         atomic_t                        score;
>>>>>>>>         bool                ready;
>>>>>>>>         bool                free_guilty;
>>>>>>>> +    struct drm_device        *ddev;
>>>>>>>>     };
>>>>>>>>       int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>                const struct drm_sched_backend_ops *ops,
>>>>>>>>                uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>>>> -           const char *name);
>>>>>>>> +           const char *name,
>>>>>>>> +           struct drm_device *ddev);
>>>>>>>>       void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548375418%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wNLdozuhVS3smIpAuWB0tjFO3XDo1OmmZEgTCxviJaI%3D&amp;reserved=0
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=OZGMVRwFXiuhoG3%2FTP54e6vk0xpMQujqAlNxtCcX7kA%3D&amp;reserved=0
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C644a4f3feb79447fd6a408d8904dab27%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418010548385367%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=qXKgWmi%2FU042boaDF43w5uIKRLFVNgwiPYrEN%2FxV0pc%3D&amp;reserved=0
>>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
  2020-11-24 17:40                 ` Christian König
@ 2020-11-24 17:44                   ` Luben Tuikov
  -1 siblings, 0 replies; 212+ messages in thread
From: Luben Tuikov @ 2020-11-24 17:44 UTC (permalink / raw)
  To: christian.koenig, Andrey Grodzovsky, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

On 2020-11-24 12:40 p.m., Christian König wrote:
> Am 24.11.20 um 18:11 schrieb Luben Tuikov:
>> On 2020-11-24 2:50 a.m., Christian König wrote:
>>> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>>>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>>>> I think that this should go into the device specific recovery
>>>>>>> function and not in the scheduler.
>>>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>>>> work to restart from here
>>>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>>>> cover other places like
>>>>>> job cleanup or starting new job but those should stop once the
>>>>>> scheduler thread is stopped later.
>>>>> Yeah, but this is rather unclean. We should probably return an error
>>>>> code instead if the timer should be rearmed or not.
>>>> Christian, this is exactly my work I told you about
>>>> last week on Wednesday in our weekly meeting. And
>>>> which I wrote to you in an email last year about this
>>>> time.
>>> Yeah, that's why I'm suggesting it here as well.
>> It seems you're suggesting that Andrey do it, while
>> all too well you know I've been working on this
>> for some time now.
> 
> Changing the return value is just a minimal change and I didn't want to 
> block Andrey in any way.
> 

But it is the suggestion I had last year this time.
It is the whole root of my changes--it's a gamechanger.

>>
>> I wrote you about this last year same time
>> in an email. And I discussed it on the Wednesday
>> meeting.
>>
>> You could've mentioned that here the first time.
>>
>>>> So what do we do now?
>>> Split your patches into smaller parts and submit them chunk by chunk.
>>>
>>> E.g. renames first and then functional changes grouped by area they change.
>> I have, but my final patch, a tiny one but which implements
>> the core reason for the change seems buggy, and I'm looking
>> for a way to debug it.
> 
> Just send it out in chunks, e.g. non functional changes like renames 
> shouldn't cause any problems and having them in the branch early 
> minimizes conflicts with work from others.

Yeah, I agree, that's a good idea.

My final tiny patch is causing me grief and I'd rather
have had it working. :'-(

Regards,
Luben

> 
> Regards,
> Christian.
> 
>>
>> Regards,
>> Luben
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> I can submit those changes without the last part,
>>>> which builds on this change.
>>>>
>>>> I'm still testing the last part and was hoping
>>>> to submit it all in one sequence of patches,
>>>> after my testing.
>>>>
>>>> Regards,
>>>> Luben
>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>>>     drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>>>     drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>>>     drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>>>     include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>>>     7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> index d56f402..d0b0021 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>>>> amdgpu_ring *ring,
>>>>>>>>               r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>>                        num_hw_submission, amdgpu_job_hang_limit,
>>>>>>>> -                   timeout, ring->name);
>>>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>>>             if (r) {
>>>>>>>>                 DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>>                       ring->name);
>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> index cd46c88..7678287 100644
>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>>>           ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>>>                      etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>>>> +                 gpu->drm);
>>>>>>>>         if (ret)
>>>>>>>>             return ret;
>>>>>>>>     diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>>>> *pipe, const char *name)
>>>>>>>>           return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>>>                       lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>>>> -                  name);
>>>>>>>> +                  name,
>>>>>>>> +                  pipe->ldev->ddev);
>>>>>>>>     }
>>>>>>>>       void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> index 30e7b71..37b03b01 100644
>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>>>> *pfdev)
>>>>>>>>             ret = drm_sched_init(&js->queue[j].sched,
>>>>>>>>                          &panfrost_sched_ops,
>>>>>>>>                          1, 0, msecs_to_jiffies(500),
>>>>>>>> -                     "pan_js");
>>>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>>>> ret);
>>>>>>>>                 goto err_sched;
>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> index c3f0bd0..95db8c6 100644
>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> @@ -53,6 +53,7 @@
>>>>>>>>     #include <drm/drm_print.h>
>>>>>>>>     #include <drm/gpu_scheduler.h>
>>>>>>>>     #include <drm/spsc_queue.h>
>>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>>       #define CREATE_TRACE_POINTS
>>>>>>>>     #include "gpu_scheduler_trace.h"
>>>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>>>> work_struct *work)
>>>>>>>>         struct drm_gpu_scheduler *sched;
>>>>>>>>         struct drm_sched_job *job;
>>>>>>>>     +    int idx;
>>>>>>>> +
>>>>>>>>         sched = container_of(work, struct drm_gpu_scheduler,
>>>>>>>> work_tdr.work);
>>>>>>>>     +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>>>> scheduler:%s",
>>>>>>>> +             __func__, sched->name);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         /* Protects against concurrent deletion in
>>>>>>>> drm_sched_get_cleanup_job */
>>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>>         job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>>>> work_struct *work)
>>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>>         drm_sched_start_timeout(sched);
>>>>>>>>         spin_unlock(&sched->job_list_lock);
>>>>>>>> +
>>>>>>>> +    drm_dev_exit(idx);
>>>>>>>>     }
>>>>>>>>        /**
>>>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>                unsigned hw_submission,
>>>>>>>>                unsigned hang_limit,
>>>>>>>>                long timeout,
>>>>>>>> -           const char *name)
>>>>>>>> +           const char *name,
>>>>>>>> +           struct drm_device *ddev)
>>>>>>>>     {
>>>>>>>>         int i, ret;
>>>>>>>>         sched->ops = ops;
>>>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>         sched->name = name;
>>>>>>>>         sched->timeout = timeout;
>>>>>>>>         sched->hang_limit = hang_limit;
>>>>>>>> +    sched->ddev = ddev;
>>>>>>>>         for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>>>> i++)
>>>>>>>>             drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>>>     diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> index 0747614..f5076e5 100644
>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_bin_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_bin");
>>>>>>>> +                 "v3d_bin",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>>>> %d.", ret);
>>>>>>>>             return ret;
>>>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_render_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_render");
>>>>>>>> +                 "v3d_render",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>>>> %d.",
>>>>>>>>                 ret);
>>>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_tfu_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_tfu");
>>>>>>>> +                 "v3d_tfu",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>>>                 ret);
>>>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                          &v3d_csd_sched_ops,
>>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                     "v3d_csd");
>>>>>>>> +                     "v3d_csd",
>>>>>>>> +                     &v3d->drm);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>>>> %d.",
>>>>>>>>                     ret);
>>>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                          &v3d_cache_clean_sched_ops,
>>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                     "v3d_cache_clean");
>>>>>>>> +                     "v3d_cache_clean",
>>>>>>>> +                     &v3d->drm);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>>>> scheduler: %d.",
>>>>>>>>                     ret);
>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>> index 9243655..a980709 100644
>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>> @@ -32,6 +32,7 @@
>>>>>>>>       struct drm_gpu_scheduler;
>>>>>>>>     struct drm_sched_rq;
>>>>>>>> +struct drm_device;
>>>>>>>>       /* These are often used as an (initial) index
>>>>>>>>      * to an array, and as such should start at 0.
>>>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>>>      * @score: score to help loadbalancer pick a idle sched
>>>>>>>>      * @ready: marks if the underlying HW is ready to work
>>>>>>>>      * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>>>      *
>>>>>>>>      * One scheduler is implemented for each hardware ring.
>>>>>>>>      */
>>>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>>>         atomic_t                        score;
>>>>>>>>         bool                ready;
>>>>>>>>         bool                free_guilty;
>>>>>>>> +    struct drm_device        *ddev;
>>>>>>>>     };
>>>>>>>>       int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>                const struct drm_sched_backend_ops *ops,
>>>>>>>>                uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>>>> -           const char *name);
>>>>>>>> +           const char *name,
>>>>>>>> +           struct drm_device *ddev);
>>>>>>>>       void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235890016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=rNvsJqt6zhqRnOWfxsdW1oeUBNRMLBTb%2FTngmMP99O0%3D&amp;reserved=0
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235890016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=I6v%2FjvfWLBP655oaROZRE7xkHhxrXHSWCJ5gJNrm8ac%3D&amp;reserved=0
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235890016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=rNvsJqt6zhqRnOWfxsdW1oeUBNRMLBTb%2FTngmMP99O0%3D&amp;reserved=0
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235899973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bpJPveuR5svZfujWWolwD2p4pdMChuPIExrSpeVOXZc%3D&amp;reserved=0
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235899973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=H%2Fne14RjbANAps4jh2seXZ6UNrraDxoUkkbK2fXODvM%3D&amp;reserved=0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235899973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bpJPveuR5svZfujWWolwD2p4pdMChuPIExrSpeVOXZc%3D&amp;reserved=0
> 

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged.
@ 2020-11-24 17:44                   ` Luben Tuikov
  0 siblings, 0 replies; 212+ messages in thread
From: Luben Tuikov @ 2020-11-24 17:44 UTC (permalink / raw)
  To: christian.koenig, Andrey Grodzovsky, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

On 2020-11-24 12:40 p.m., Christian König wrote:
> Am 24.11.20 um 18:11 schrieb Luben Tuikov:
>> On 2020-11-24 2:50 a.m., Christian König wrote:
>>> Am 24.11.20 um 02:12 schrieb Luben Tuikov:
>>>> On 2020-11-23 3:06 a.m., Christian König wrote:
>>>>> Am 23.11.20 um 06:37 schrieb Andrey Grodzovsky:
>>>>>> On 11/22/20 6:57 AM, Christian König wrote:
>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>> No point to try recovery if device is gone, it's meaningless.
>>>>>>> I think that this should go into the device specific recovery
>>>>>>> function and not in the scheduler.
>>>>>> The timeout timer is rearmed here, so this prevents any new recovery
>>>>>> work to restart from here
>>>>>> after drm_dev_unplug was executed from amdgpu_pci_remove.It will not
>>>>>> cover other places like
>>>>>> job cleanup or starting new job but those should stop once the
>>>>>> scheduler thread is stopped later.
>>>>> Yeah, but this is rather unclean. We should probably return an error
>>>>> code instead if the timer should be rearmed or not.
>>>> Christian, this is exactly my work I told you about
>>>> last week on Wednesday in our weekly meeting. And
>>>> which I wrote to you in an email last year about this
>>>> time.
>>> Yeah, that's why I'm suggesting it here as well.
>> It seems you're suggesting that Andrey do it, while
>> all too well you know I've been working on this
>> for some time now.
> 
> Changing the return value is just a minimal change and I didn't want to 
> block Andrey in any way.
> 

But it is the suggestion I had last year this time.
It is the whole root of my changes--it's a gamechanger.

>>
>> I wrote you about this last year same time
>> in an email. And I discussed it on the Wednesday
>> meeting.
>>
>> You could've mentioned that here the first time.
>>
>>>> So what do we do now?
>>> Split your patches into smaller parts and submit them chunk by chunk.
>>>
>>> E.g. renames first and then functional changes grouped by area they change.
>> I have, but my final patch, a tiny one but which implements
>> the core reason for the change seems buggy, and I'm looking
>> for a way to debug it.
> 
> Just send it out in chunks, e.g. non functional changes like renames 
> shouldn't cause any problems and having them in the branch early 
> minimizes conflicts with work from others.

Yeah, I agree, that's a good idea.

My final tiny patch is causing me grief and I'd rather
have had it working. :'-(

Regards,
Luben

> 
> Regards,
> Christian.
> 
>>
>> Regards,
>> Luben
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> I can submit those changes without the last part,
>>>> which builds on this change.
>>>>
>>>> I'm still testing the last part and was hoping
>>>> to submit it all in one sequence of patches,
>>>> after my testing.
>>>>
>>>> Regards,
>>>> Luben
>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  2 +-
>>>>>>>>     drivers/gpu/drm/etnaviv/etnaviv_sched.c   |  3 ++-
>>>>>>>>     drivers/gpu/drm/lima/lima_sched.c         |  3 ++-
>>>>>>>>     drivers/gpu/drm/panfrost/panfrost_job.c   |  2 +-
>>>>>>>>     drivers/gpu/drm/scheduler/sched_main.c    | 15 ++++++++++++++-
>>>>>>>>     drivers/gpu/drm/v3d/v3d_sched.c           | 15 ++++++++++-----
>>>>>>>>     include/drm/gpu_scheduler.h               |  6 +++++-
>>>>>>>>     7 files changed, 35 insertions(+), 11 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> index d56f402..d0b0021 100644
>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>>>> @@ -487,7 +487,7 @@ int amdgpu_fence_driver_init_ring(struct
>>>>>>>> amdgpu_ring *ring,
>>>>>>>>               r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
>>>>>>>>                        num_hw_submission, amdgpu_job_hang_limit,
>>>>>>>> -                   timeout, ring->name);
>>>>>>>> +                   timeout, ring->name, &adev->ddev);
>>>>>>>>             if (r) {
>>>>>>>>                 DRM_ERROR("Failed to create scheduler on ring %s.\n",
>>>>>>>>                       ring->name);
>>>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> index cd46c88..7678287 100644
>>>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>>>>>>>> @@ -185,7 +185,8 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu)
>>>>>>>>           ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops,
>>>>>>>>                      etnaviv_hw_jobs_limit, etnaviv_job_hang_limit,
>>>>>>>> -                 msecs_to_jiffies(500), dev_name(gpu->dev));
>>>>>>>> +                 msecs_to_jiffies(500), dev_name(gpu->dev),
>>>>>>>> +                 gpu->drm);
>>>>>>>>         if (ret)
>>>>>>>>             return ret;
>>>>>>>>     diff --git a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> index dc6df9e..8a7e5d7ca 100644
>>>>>>>> --- a/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/lima/lima_sched.c
>>>>>>>> @@ -505,7 +505,8 @@ int lima_sched_pipe_init(struct lima_sched_pipe
>>>>>>>> *pipe, const char *name)
>>>>>>>>           return drm_sched_init(&pipe->base, &lima_sched_ops, 1,
>>>>>>>>                       lima_job_hang_limit, msecs_to_jiffies(timeout),
>>>>>>>> -                  name);
>>>>>>>> +                  name,
>>>>>>>> +                  pipe->ldev->ddev);
>>>>>>>>     }
>>>>>>>>       void lima_sched_pipe_fini(struct lima_sched_pipe *pipe)
>>>>>>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> index 30e7b71..37b03b01 100644
>>>>>>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>>>>>>> @@ -520,7 +520,7 @@ int panfrost_job_init(struct panfrost_device
>>>>>>>> *pfdev)
>>>>>>>>             ret = drm_sched_init(&js->queue[j].sched,
>>>>>>>>                          &panfrost_sched_ops,
>>>>>>>>                          1, 0, msecs_to_jiffies(500),
>>>>>>>> -                     "pan_js");
>>>>>>>> +                     "pan_js", pfdev->ddev);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(pfdev->dev, "Failed to create scheduler: %d.",
>>>>>>>> ret);
>>>>>>>>                 goto err_sched;
>>>>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> index c3f0bd0..95db8c6 100644
>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>> @@ -53,6 +53,7 @@
>>>>>>>>     #include <drm/drm_print.h>
>>>>>>>>     #include <drm/gpu_scheduler.h>
>>>>>>>>     #include <drm/spsc_queue.h>
>>>>>>>> +#include <drm/drm_drv.h>
>>>>>>>>       #define CREATE_TRACE_POINTS
>>>>>>>>     #include "gpu_scheduler_trace.h"
>>>>>>>> @@ -283,8 +284,16 @@ static void drm_sched_job_timedout(struct
>>>>>>>> work_struct *work)
>>>>>>>>         struct drm_gpu_scheduler *sched;
>>>>>>>>         struct drm_sched_job *job;
>>>>>>>>     +    int idx;
>>>>>>>> +
>>>>>>>>         sched = container_of(work, struct drm_gpu_scheduler,
>>>>>>>> work_tdr.work);
>>>>>>>>     +    if (!drm_dev_enter(sched->ddev, &idx)) {
>>>>>>>> +        DRM_INFO("%s - device unplugged skipping recovery on
>>>>>>>> scheduler:%s",
>>>>>>>> +             __func__, sched->name);
>>>>>>>> +        return;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         /* Protects against concurrent deletion in
>>>>>>>> drm_sched_get_cleanup_job */
>>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>>         job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>>> @@ -316,6 +325,8 @@ static void drm_sched_job_timedout(struct
>>>>>>>> work_struct *work)
>>>>>>>>         spin_lock(&sched->job_list_lock);
>>>>>>>>         drm_sched_start_timeout(sched);
>>>>>>>>         spin_unlock(&sched->job_list_lock);
>>>>>>>> +
>>>>>>>> +    drm_dev_exit(idx);
>>>>>>>>     }
>>>>>>>>        /**
>>>>>>>> @@ -845,7 +856,8 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>                unsigned hw_submission,
>>>>>>>>                unsigned hang_limit,
>>>>>>>>                long timeout,
>>>>>>>> -           const char *name)
>>>>>>>> +           const char *name,
>>>>>>>> +           struct drm_device *ddev)
>>>>>>>>     {
>>>>>>>>         int i, ret;
>>>>>>>>         sched->ops = ops;
>>>>>>>> @@ -853,6 +865,7 @@ int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>         sched->name = name;
>>>>>>>>         sched->timeout = timeout;
>>>>>>>>         sched->hang_limit = hang_limit;
>>>>>>>> +    sched->ddev = ddev;
>>>>>>>>         for (i = DRM_SCHED_PRIORITY_MIN; i < DRM_SCHED_PRIORITY_COUNT;
>>>>>>>> i++)
>>>>>>>>             drm_sched_rq_init(sched, &sched->sched_rq[i]);
>>>>>>>>     diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> index 0747614..f5076e5 100644
>>>>>>>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>>>>>>>> @@ -401,7 +401,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_bin_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_bin");
>>>>>>>> +                 "v3d_bin",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create bin scheduler:
>>>>>>>> %d.", ret);
>>>>>>>>             return ret;
>>>>>>>> @@ -411,7 +412,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_render_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_render");
>>>>>>>> +                 "v3d_render",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create render scheduler:
>>>>>>>> %d.",
>>>>>>>>                 ret);
>>>>>>>> @@ -423,7 +425,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                      &v3d_tfu_sched_ops,
>>>>>>>>                      hw_jobs_limit, job_hang_limit,
>>>>>>>>                      msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                 "v3d_tfu");
>>>>>>>> +                 "v3d_tfu",
>>>>>>>> +                 &v3d->drm);
>>>>>>>>         if (ret) {
>>>>>>>>             dev_err(v3d->drm.dev, "Failed to create TFU scheduler: %d.",
>>>>>>>>                 ret);
>>>>>>>> @@ -436,7 +439,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                          &v3d_csd_sched_ops,
>>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                     "v3d_csd");
>>>>>>>> +                     "v3d_csd",
>>>>>>>> +                     &v3d->drm);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CSD scheduler:
>>>>>>>> %d.",
>>>>>>>>                     ret);
>>>>>>>> @@ -448,7 +452,8 @@ v3d_sched_init(struct v3d_dev *v3d)
>>>>>>>>                          &v3d_cache_clean_sched_ops,
>>>>>>>>                          hw_jobs_limit, job_hang_limit,
>>>>>>>>                          msecs_to_jiffies(hang_limit_ms),
>>>>>>>> -                     "v3d_cache_clean");
>>>>>>>> +                     "v3d_cache_clean",
>>>>>>>> +                     &v3d->drm);
>>>>>>>>             if (ret) {
>>>>>>>>                 dev_err(v3d->drm.dev, "Failed to create CACHE_CLEAN
>>>>>>>> scheduler: %d.",
>>>>>>>>                     ret);
>>>>>>>> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
>>>>>>>> index 9243655..a980709 100644
>>>>>>>> --- a/include/drm/gpu_scheduler.h
>>>>>>>> +++ b/include/drm/gpu_scheduler.h
>>>>>>>> @@ -32,6 +32,7 @@
>>>>>>>>       struct drm_gpu_scheduler;
>>>>>>>>     struct drm_sched_rq;
>>>>>>>> +struct drm_device;
>>>>>>>>       /* These are often used as an (initial) index
>>>>>>>>      * to an array, and as such should start at 0.
>>>>>>>> @@ -267,6 +268,7 @@ struct drm_sched_backend_ops {
>>>>>>>>      * @score: score to help loadbalancer pick a idle sched
>>>>>>>>      * @ready: marks if the underlying HW is ready to work
>>>>>>>>      * @free_guilty: A hit to time out handler to free the guilty job.
>>>>>>>> + * @ddev: Pointer to drm device of this scheduler.
>>>>>>>>      *
>>>>>>>>      * One scheduler is implemented for each hardware ring.
>>>>>>>>      */
>>>>>>>> @@ -288,12 +290,14 @@ struct drm_gpu_scheduler {
>>>>>>>>         atomic_t                        score;
>>>>>>>>         bool                ready;
>>>>>>>>         bool                free_guilty;
>>>>>>>> +    struct drm_device        *ddev;
>>>>>>>>     };
>>>>>>>>       int drm_sched_init(struct drm_gpu_scheduler *sched,
>>>>>>>>                const struct drm_sched_backend_ops *ops,
>>>>>>>>                uint32_t hw_submission, unsigned hang_limit, long timeout,
>>>>>>>> -           const char *name);
>>>>>>>> +           const char *name,
>>>>>>>> +           struct drm_device *ddev);
>>>>>>>>       void drm_sched_fini(struct drm_gpu_scheduler *sched);
>>>>>>>>     int drm_sched_job_init(struct drm_sched_job *job,
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235890016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=rNvsJqt6zhqRnOWfxsdW1oeUBNRMLBTb%2FTngmMP99O0%3D&amp;reserved=0
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235890016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=I6v%2FjvfWLBP655oaROZRE7xkHhxrXHSWCJ5gJNrm8ac%3D&amp;reserved=0
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235890016%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=rNvsJqt6zhqRnOWfxsdW1oeUBNRMLBTb%2FTngmMP99O0%3D&amp;reserved=0
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235899973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bpJPveuR5svZfujWWolwD2p4pdMChuPIExrSpeVOXZc%3D&amp;reserved=0
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235899973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=H%2Fne14RjbANAps4jh2seXZ6UNrraDxoUkkbK2fXODvM%3D&amp;reserved=0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Cluben.tuikov%40amd.com%7C7e6fb0df75384eb9f09808d890a00481%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637418364235899973%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=bpJPveuR5svZfujWWolwD2p4pdMChuPIExrSpeVOXZc%3D&amp;reserved=0
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
  2020-11-24 14:49     ` Daniel Vetter
@ 2020-11-24 22:27       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-24 22:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: gregkh, ckoenig.leichtzumerken, dri-devel, amd-gfx,
	daniel.vetter, Alexander.Deucher, yuq825


On 11/24/20 9:49 AM, Daniel Vetter wrote:
> On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
>> Avoids NULL ptr due to kobj->sd being unset on device removal.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
>>   2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> index caf828a..812e592 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> @@ -27,6 +27,7 @@
>>   #include <linux/uaccess.h>
>>   #include <linux/reboot.h>
>>   #include <linux/syscalls.h>
>> +#include <drm/drm_drv.h>
>>   
>>   #include "amdgpu.h"
>>   #include "amdgpu_ras.h"
>> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
>>   		.attrs = attrs,
>>   	};
>>   
>> -	sysfs_remove_group(&adev->dev->kobj, &group);
>> +	if (!drm_dev_is_unplugged(&adev->ddev))
>> +		sysfs_remove_group(&adev->dev->kobj, &group);
> This looks wrong. sysfs, like any other interface, should be
> unconditionally thrown out when we do the drm_dev_unregister. Whether
> hotunplugged or not should matter at all. Either this isn't needed at all,
> or something is wrong with the ordering here. But definitely fishy.
> -Daniel


So technically this is needed because kobejct's sysfs directory entry kobj->sd 
is set to NULL
on device removal (from sysfs_remove_dir) but because we don't finalize the device
until last reference to drm file is dropped (which can happen later) we end up 
calling sysfs_remove_file/dir after
this pointer is NULL. sysfs_remove_file checks for NULL and aborts while 
sysfs_remove_dir
is not and that why I guard against calls to sysfs_remove_dir.
But indeed the whole approach in the driver is incorrect, as Greg pointed out - 
we should use
default groups attributes instead of explicit calls to sysfs interface and this 
would save those troubles.
But again. the issue here of scope of work, converting all of amdgpu to default 
groups attributes is somewhat
lengthy process with extra testing as the entire driver is papered with sysfs 
references and seems to me more of a standalone
cleanup, just like switching to devm_ and drmm_ work. To me at least it seems 
that it makes more sense
to finalize and push the hot unplug patches so that this new functionality can 
be part of the driver sooner
and then incrementally improve it by working on those other topics. Just as 
devm_/drmm_ I also added sysfs cleanup
to my TODO list in the RFC patch.

Andrey


>
>>   
>>   	return 0;
>>   }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> index 2b7c90b..54331fc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> @@ -24,6 +24,7 @@
>>   #include <linux/firmware.h>
>>   #include <linux/slab.h>
>>   #include <linux/module.h>
>> +#include <drm/drm_drv.h>
>>   
>>   #include "amdgpu.h"
>>   #include "amdgpu_ucode.h"
>> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
>>   
>>   void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
>>   {
>> -	sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>> +	if (!drm_dev_is_unplugged(&adev->ddev))
>> +		sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>   }
>>   
>>   static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
>> -- 
>> 2.7.4
>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
@ 2020-11-24 22:27       ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-24 22:27 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, gregkh, ckoenig.leichtzumerken, dri-devel, eric, ppaalanen,
	amd-gfx, daniel.vetter, Alexander.Deucher, yuq825,
	Harry.Wentland, l.stach


On 11/24/20 9:49 AM, Daniel Vetter wrote:
> On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
>> Avoids NULL ptr due to kobj->sd being unset on device removal.
>>
>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
>>   2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> index caf828a..812e592 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>> @@ -27,6 +27,7 @@
>>   #include <linux/uaccess.h>
>>   #include <linux/reboot.h>
>>   #include <linux/syscalls.h>
>> +#include <drm/drm_drv.h>
>>   
>>   #include "amdgpu.h"
>>   #include "amdgpu_ras.h"
>> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
>>   		.attrs = attrs,
>>   	};
>>   
>> -	sysfs_remove_group(&adev->dev->kobj, &group);
>> +	if (!drm_dev_is_unplugged(&adev->ddev))
>> +		sysfs_remove_group(&adev->dev->kobj, &group);
> This looks wrong. sysfs, like any other interface, should be
> unconditionally thrown out when we do the drm_dev_unregister. Whether
> hotunplugged or not should matter at all. Either this isn't needed at all,
> or something is wrong with the ordering here. But definitely fishy.
> -Daniel


So technically this is needed because kobejct's sysfs directory entry kobj->sd 
is set to NULL
on device removal (from sysfs_remove_dir) but because we don't finalize the device
until last reference to drm file is dropped (which can happen later) we end up 
calling sysfs_remove_file/dir after
this pointer is NULL. sysfs_remove_file checks for NULL and aborts while 
sysfs_remove_dir
is not and that why I guard against calls to sysfs_remove_dir.
But indeed the whole approach in the driver is incorrect, as Greg pointed out - 
we should use
default groups attributes instead of explicit calls to sysfs interface and this 
would save those troubles.
But again. the issue here of scope of work, converting all of amdgpu to default 
groups attributes is somewhat
lengthy process with extra testing as the entire driver is papered with sysfs 
references and seems to me more of a standalone
cleanup, just like switching to devm_ and drmm_ work. To me at least it seems 
that it makes more sense
to finalize and push the hot unplug patches so that this new functionality can 
be part of the driver sooner
and then incrementally improve it by working on those other topics. Just as 
devm_/drmm_ I also added sysfs cleanup
to my TODO list in the RFC patch.

Andrey


>
>>   
>>   	return 0;
>>   }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> index 2b7c90b..54331fc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>> @@ -24,6 +24,7 @@
>>   #include <linux/firmware.h>
>>   #include <linux/slab.h>
>>   #include <linux/module.h>
>> +#include <drm/drm_drv.h>
>>   
>>   #include "amdgpu.h"
>>   #include "amdgpu_ucode.h"
>> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
>>   
>>   void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
>>   {
>> -	sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>> +	if (!drm_dev_is_unplugged(&adev->ddev))
>> +		sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>   }
>>   
>>   static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
>> -- 
>> 2.7.4
>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
  2020-11-24 22:27       ` Andrey Grodzovsky
@ 2020-11-25  9:04         ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25  9:04 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: amd-gfx list, Christian König, dri-devel, Qiang Yu, Greg KH,
	Alex Deucher

On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
> On 11/24/20 9:49 AM, Daniel Vetter wrote:
> > On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
> >> Avoids NULL ptr due to kobj->sd being unset on device removal.
> >>
> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
> >>   2 files changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >> index caf828a..812e592 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >> @@ -27,6 +27,7 @@
> >>   #include <linux/uaccess.h>
> >>   #include <linux/reboot.h>
> >>   #include <linux/syscalls.h>
> >> +#include <drm/drm_drv.h>
> >>
> >>   #include "amdgpu.h"
> >>   #include "amdgpu_ras.h"
> >> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
> >>              .attrs = attrs,
> >>      };
> >>
> >> -    sysfs_remove_group(&adev->dev->kobj, &group);
> >> +    if (!drm_dev_is_unplugged(&adev->ddev))
> >> +            sysfs_remove_group(&adev->dev->kobj, &group);
> > This looks wrong. sysfs, like any other interface, should be
> > unconditionally thrown out when we do the drm_dev_unregister. Whether
> > hotunplugged or not should matter at all. Either this isn't needed at all,
> > or something is wrong with the ordering here. But definitely fishy.
> > -Daniel
>
>
> So technically this is needed because kobejct's sysfs directory entry kobj->sd
> is set to NULL
> on device removal (from sysfs_remove_dir) but because we don't finalize the device
> until last reference to drm file is dropped (which can happen later) we end up
> calling sysfs_remove_file/dir after
> this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
> sysfs_remove_dir
> is not and that why I guard against calls to sysfs_remove_dir.
> But indeed the whole approach in the driver is incorrect, as Greg pointed out -
> we should use
> default groups attributes instead of explicit calls to sysfs interface and this
> would save those troubles.
> But again. the issue here of scope of work, converting all of amdgpu to default
> groups attributes is somewhat
> lengthy process with extra testing as the entire driver is papered with sysfs
> references and seems to me more of a standalone
> cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
> that it makes more sense
> to finalize and push the hot unplug patches so that this new functionality can
> be part of the driver sooner
> and then incrementally improve it by working on those other topics. Just as
> devm_/drmm_ I also added sysfs cleanup
> to my TODO list in the RFC patch.

Hm, whether you solve this with the default group stuff to
auto-remove, or remove explicitly at the right time doesn't matter
much. The underlying problem you have here is that it's done way too
late. sysfs removal (like all uapi interfaces) need to be removed as
part of drm_dev_unregister. I guess aside from the split into fini_hw
and fini_sw, you also need an unregister_late callback (like we have
already for drm_connector, so that e.g. backlight and similar stuff
can be unregistered).

Papering over the underlying bug like this doesn't really fix much,
the lifetimes are still wrong.
-Daniel

>
> Andrey
>
>
> >
> >>
> >>      return 0;
> >>   }
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> >> index 2b7c90b..54331fc 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> >> @@ -24,6 +24,7 @@
> >>   #include <linux/firmware.h>
> >>   #include <linux/slab.h>
> >>   #include <linux/module.h>
> >> +#include <drm/drm_drv.h>
> >>
> >>   #include "amdgpu.h"
> >>   #include "amdgpu_ucode.h"
> >> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
> >>
> >>   void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
> >>   {
> >> -    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> >> +    if (!drm_dev_is_unplugged(&adev->ddev))
> >> +            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> >>   }
> >>
> >>   static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
> >> --
> >> 2.7.4
> >>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
@ 2020-11-25  9:04         ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25  9:04 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Rob Herring, amd-gfx list, Christian König, dri-devel,
	Anholt, Eric, Pekka Paalanen, Qiang Yu, Greg KH, Alex Deucher,
	Wentland, Harry, Lucas Stach

On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
> On 11/24/20 9:49 AM, Daniel Vetter wrote:
> > On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
> >> Avoids NULL ptr due to kobj->sd being unset on device removal.
> >>
> >> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
> >>   2 files changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >> index caf828a..812e592 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >> @@ -27,6 +27,7 @@
> >>   #include <linux/uaccess.h>
> >>   #include <linux/reboot.h>
> >>   #include <linux/syscalls.h>
> >> +#include <drm/drm_drv.h>
> >>
> >>   #include "amdgpu.h"
> >>   #include "amdgpu_ras.h"
> >> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
> >>              .attrs = attrs,
> >>      };
> >>
> >> -    sysfs_remove_group(&adev->dev->kobj, &group);
> >> +    if (!drm_dev_is_unplugged(&adev->ddev))
> >> +            sysfs_remove_group(&adev->dev->kobj, &group);
> > This looks wrong. sysfs, like any other interface, should be
> > unconditionally thrown out when we do the drm_dev_unregister. Whether
> > hotunplugged or not should matter at all. Either this isn't needed at all,
> > or something is wrong with the ordering here. But definitely fishy.
> > -Daniel
>
>
> So technically this is needed because kobejct's sysfs directory entry kobj->sd
> is set to NULL
> on device removal (from sysfs_remove_dir) but because we don't finalize the device
> until last reference to drm file is dropped (which can happen later) we end up
> calling sysfs_remove_file/dir after
> this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
> sysfs_remove_dir
> is not and that why I guard against calls to sysfs_remove_dir.
> But indeed the whole approach in the driver is incorrect, as Greg pointed out -
> we should use
> default groups attributes instead of explicit calls to sysfs interface and this
> would save those troubles.
> But again. the issue here of scope of work, converting all of amdgpu to default
> groups attributes is somewhat
> lengthy process with extra testing as the entire driver is papered with sysfs
> references and seems to me more of a standalone
> cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
> that it makes more sense
> to finalize and push the hot unplug patches so that this new functionality can
> be part of the driver sooner
> and then incrementally improve it by working on those other topics. Just as
> devm_/drmm_ I also added sysfs cleanup
> to my TODO list in the RFC patch.

Hm, whether you solve this with the default group stuff to
auto-remove, or remove explicitly at the right time doesn't matter
much. The underlying problem you have here is that it's done way too
late. sysfs removal (like all uapi interfaces) need to be removed as
part of drm_dev_unregister. I guess aside from the split into fini_hw
and fini_sw, you also need an unregister_late callback (like we have
already for drm_connector, so that e.g. backlight and similar stuff
can be unregistered).

Papering over the underlying bug like this doesn't really fix much,
the lifetimes are still wrong.
-Daniel

>
> Andrey
>
>
> >
> >>
> >>      return 0;
> >>   }
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> >> index 2b7c90b..54331fc 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> >> @@ -24,6 +24,7 @@
> >>   #include <linux/firmware.h>
> >>   #include <linux/slab.h>
> >>   #include <linux/module.h>
> >> +#include <drm/drm_drv.h>
> >>
> >>   #include "amdgpu.h"
> >>   #include "amdgpu_ucode.h"
> >> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
> >>
> >>   void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
> >>   {
> >> -    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> >> +    if (!drm_dev_is_unplugged(&adev->ddev))
> >> +            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> >>   }
> >>
> >>   static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
> >> --
> >> 2.7.4
> >>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-24 16:44                     ` Christian König
@ 2020-11-25 10:40                       ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25 10:40 UTC (permalink / raw)
  To: christian.koenig
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher, yuq825

On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > 
> > On 11/24/20 2:41 AM, Christian König wrote:
> > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > 
> > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > 
> > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > 
> > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > 
> > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > what you are doing with this function.
> > > > > > > > > 
> > > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > notifier per device." to see
> > > > > > > > how i use it. I don't see why this should go
> > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > registered from within TTM
> > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > 
> > > > > > > No, that is really vendor specific.
> > > > > > > 
> > > > > > > What I meant is to have a function like
> > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > to call and all tt objects are unpopulated.
> > > > > > 
> > > > > > 
> > > > > > So instead of this BO list i create and later iterate in
> > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > within
> > > > > > TTM with a single function ? Makes much more sense.
> > > > > 
> > > > > Yes, exactly.
> > > > > 
> > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > actually not the best idea, we should now check the
> > > > > pin_count instead. This way we could also have a list of the
> > > > > pinned BOs in TTM.
> > > > 
> > > > 
> > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > It's probably a good idea to combine both iterations into this
> > > > new function to cover all the BOs allocated on the device.
> > > 
> > > Yes, that's what I had in my mind as well.
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > could have unforeseen consequences.
> > > > 
> > > > 
> > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > accesses once we drop all the DMA backing pages for a particular
> > > > BO ? Because for user mappings
> > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > nothing was done for in kernel CPU mappings.
> > > 
> > > Yes exactly that.
> > > 
> > > In other words what happens if we free the ring buffer while the
> > > kernel still writes to it?
> > > 
> > > Christian.
> > 
> > 
> > While we can't control user application accesses to the mapped buffers
> > explicitly and hence we use page fault rerouting
> > I am thinking that in this  case we may be able to sprinkle
> > drm_dev_enter/exit in any such sensitive place were we might
> > CPU access a DMA buffer from the kernel ?
> 
> Yes, I fear we are going to need that.

Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
could stuff this into begin/end_cpu_access (but only for the kernel, so a
bit tricky)?

btw the other issue with dma-buf (and even worse with dma_fence) is
refcounting of the underlying drm_device. I'd expect that all your
callbacks go boom if the dma_buf outlives your drm_device. That part isn't
yet solved in your series here.
-Daniel

> 
> > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > Is there other places ?
> 
> Puh, good question. I have no idea.
> 
> > Another point is that at this point the driver shouldn't access any such
> > buffers as we are at the process finishing the device.
> > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > think there is anything else to do ?
> 
> Well there is a page fault handler for kernel mappings, but that one just
> prints the stack trace into the system log and calls BUG(); :)
> 
> Long story short we need to avoid any access to released pages after unplug.
> No matter if it's from the kernel or userspace.
> 
> Regards,
> Christian.
> 
> > 
> > Andrey
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-25 10:40                       ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25 10:40 UTC (permalink / raw)
  To: christian.koenig
  Cc: robh, Andrey Grodzovsky, daniel.vetter, dri-devel, eric,
	ppaalanen, amd-gfx, gregkh, Alexander.Deucher, yuq825,
	Harry.Wentland, l.stach

On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > 
> > On 11/24/20 2:41 AM, Christian König wrote:
> > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > 
> > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > 
> > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > 
> > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > 
> > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > what you are doing with this function.
> > > > > > > > > 
> > > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > notifier per device." to see
> > > > > > > > how i use it. I don't see why this should go
> > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > registered from within TTM
> > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > 
> > > > > > > No, that is really vendor specific.
> > > > > > > 
> > > > > > > What I meant is to have a function like
> > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > to call and all tt objects are unpopulated.
> > > > > > 
> > > > > > 
> > > > > > So instead of this BO list i create and later iterate in
> > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > within
> > > > > > TTM with a single function ? Makes much more sense.
> > > > > 
> > > > > Yes, exactly.
> > > > > 
> > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > actually not the best idea, we should now check the
> > > > > pin_count instead. This way we could also have a list of the
> > > > > pinned BOs in TTM.
> > > > 
> > > > 
> > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > It's probably a good idea to combine both iterations into this
> > > > new function to cover all the BOs allocated on the device.
> > > 
> > > Yes, that's what I had in my mind as well.
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > could have unforeseen consequences.
> > > > 
> > > > 
> > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > accesses once we drop all the DMA backing pages for a particular
> > > > BO ? Because for user mappings
> > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > nothing was done for in kernel CPU mappings.
> > > 
> > > Yes exactly that.
> > > 
> > > In other words what happens if we free the ring buffer while the
> > > kernel still writes to it?
> > > 
> > > Christian.
> > 
> > 
> > While we can't control user application accesses to the mapped buffers
> > explicitly and hence we use page fault rerouting
> > I am thinking that in this  case we may be able to sprinkle
> > drm_dev_enter/exit in any such sensitive place were we might
> > CPU access a DMA buffer from the kernel ?
> 
> Yes, I fear we are going to need that.

Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
could stuff this into begin/end_cpu_access (but only for the kernel, so a
bit tricky)?

btw the other issue with dma-buf (and even worse with dma_fence) is
refcounting of the underlying drm_device. I'd expect that all your
callbacks go boom if the dma_buf outlives your drm_device. That part isn't
yet solved in your series here.
-Daniel

> 
> > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > Is there other places ?
> 
> Puh, good question. I have no idea.
> 
> > Another point is that at this point the driver shouldn't access any such
> > buffers as we are at the process finishing the device.
> > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > think there is anything else to do ?
> 
> Well there is a page fault handler for kernel mappings, but that one just
> prints the stack trace into the system log and calls BUG(); :)
> 
> Long story short we need to avoid any access to released pages after unplug.
> No matter if it's from the kernel or userspace.
> 
> Regards,
> Christian.
> 
> > 
> > Andrey
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
  2020-11-24 15:51       ` Andrey Grodzovsky
@ 2020-11-25 10:41         ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25 10:41 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: gregkh, ckoenig.leichtzumerken, dri-devel, amd-gfx,
	daniel.vetter, Alexander.Deucher, yuq825

On Tue, Nov 24, 2020 at 10:51:57AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/24/20 9:53 AM, Daniel Vetter wrote:
> > On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
> > > Some of the stuff in amdgpu_device_fini such as HW interrupts
> > > disable and pending fences finilization must be done right away on
> > > pci_remove while most of the stuff which relates to finilizing and
> > > releasing driver data structures can be kept until
> > > drm_driver.release hook is called, i.e. when the last device
> > > reference is dropped.
> > > 
> > Uh fini_late and fini_early are rathare meaningless namings, since no
> > clear why there's a split. If you used drm_connector_funcs as inspiration,
> > that's kinda not good because 'register' itself is a reserved keyword.
> > That's why we had to add late_ prefix, could as well have used
> > C_sucks_ as prefix :-) And then the early_unregister for consistency.
> > 
> > I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
> > about what they're doing.
> > 
> > I still strongly recommend that you cut over as much as possible of the
> > fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
> > -Daniel
> 
> 
> Definitely, and I put it in a TODO list in the RFC patch.Also, as I
> mentioned before -
> I just prefer to leave it for a follow up work because it's non trivial and
> requires shuffling
> a lof of stuff around in the driver. I was thinking of committing the work
> in incremental steps -
> so it's easier to merge it and control for breakages.

Yeah doing devm/drmm conversion later on makes sense. It'd still try to
have better names than what you're currently going with. A few of these
will likely stick around for very long, not just interim.
-Daniel

> 
> Andrey
> 
> 
> > 
> > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
> > >   9 files changed, 65 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > index 83ac06a..6243f6d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
> > >   int amdgpu_device_init(struct amdgpu_device *adev,
> > >   		       uint32_t flags);
> > > -void amdgpu_device_fini(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_late(struct amdgpu_device *adev);
> > > +
> > >   int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
> > >   void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
> > > @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
> > >   int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
> > >   void amdgpu_driver_postclose_kms(struct drm_device *dev,
> > >   				 struct drm_file *file_priv);
> > > +void amdgpu_driver_release_kms(struct drm_device *dev);
> > > +
> > >   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
> > >   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
> > >   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 2f60b70..797d94d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> > >    * Tear down the driver info (all asics).
> > >    * Called at driver shutdown.
> > >    */
> > > -void amdgpu_device_fini(struct amdgpu_device *adev)
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev)
> > >   {
> > >   	dev_info(adev->dev, "amdgpu: finishing device.\n");
> > >   	flush_delayed_work(&adev->delayed_init_work);
> > >   	adev->shutdown = true;
> > > -	kfree(adev->pci_state);
> > > -
> > >   	/* make sure IB test finished before entering exclusive mode
> > >   	 * to avoid preemption on IB test
> > >   	 * */
> > > @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
> > >   		else
> > >   			drm_atomic_helper_shutdown(adev_to_drm(adev));
> > >   	}
> > > -	amdgpu_fence_driver_fini(adev);
> > > +	amdgpu_fence_driver_fini_early(adev);
> > >   	if (adev->pm_sysfs_en)
> > >   		amdgpu_pm_sysfs_fini(adev);
> > >   	amdgpu_fbdev_fini(adev);
> > > +
> > > +	amdgpu_irq_fini_early(adev);
> > > +}
> > > +
> > > +void amdgpu_device_fini_late(struct amdgpu_device *adev)
> > > +{
> > >   	amdgpu_device_ip_fini(adev);
> > > +	amdgpu_fence_driver_fini_late(adev);
> > >   	release_firmware(adev->firmware.gpu_info_fw);
> > >   	adev->firmware.gpu_info_fw = NULL;
> > >   	adev->accel_working = false;
> > > @@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
> > >   		amdgpu_pmu_fini(adev);
> > >   	if (adev->mman.discovery_bin)
> > >   		amdgpu_discovery_fini(adev);
> > > +
> > > +	kfree(adev->pci_state);
> > > +
> > >   }
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index 7f98cf1..3d130fc 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
> > >   {
> > >   	struct drm_device *dev = pci_get_drvdata(pdev);
> > > -#ifdef MODULE
> > > -	if (THIS_MODULE->state != MODULE_STATE_GOING)
> > > -#endif
> > > -		DRM_ERROR("Hotplug removal is not supported\n");
> > >   	drm_dev_unplug(dev);
> > >   	amdgpu_driver_unload_kms(dev);
> > > +
> > >   	pci_disable_device(pdev);
> > > -	pci_set_drvdata(pdev, NULL);
> > >   	drm_dev_put(dev);
> > >   }
> > > @@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
> > >   	.dumb_create = amdgpu_mode_dumb_create,
> > >   	.dumb_map_offset = amdgpu_mode_dumb_mmap,
> > >   	.fops = &amdgpu_driver_kms_fops,
> > > +	.release = &amdgpu_driver_release_kms,
> > >   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > >   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > index d0b0021..c123aa6 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
> > >    *
> > >    * Tear down the fence driver for all possible rings (all asics).
> > >    */
> > > -void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
> > > +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
> > >   {
> > >   	unsigned i, j;
> > >   	int r;
> > > @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
> > >   		if (!ring->no_scheduler)
> > >   			drm_sched_fini(&ring->sched);
> > >   		del_timer_sync(&ring->fence_drv.fallback_timer);
> > > +	}
> > > +}
> > > +
> > > +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
> > > +{
> > > +	unsigned int i, j;
> > > +
> > > +	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
> > > +		struct amdgpu_ring *ring = adev->rings[i];
> > > +
> > > +		if (!ring || !ring->fence_drv.initialized)
> > > +			continue;
> > > +
> > >   		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
> > >   			dma_fence_put(ring->fence_drv.fences[j]);
> > >   		kfree(ring->fence_drv.fences);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > > index 300ac73..a833197 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > > @@ -49,6 +49,7 @@
> > >   #include <drm/drm_irq.h>
> > >   #include <drm/drm_vblank.h>
> > >   #include <drm/amdgpu_drm.h>
> > > +#include <drm/drm_drv.h>
> > >   #include "amdgpu.h"
> > >   #include "amdgpu_ih.h"
> > >   #include "atom.h"
> > > @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
> > >   	return 0;
> > >   }
> > > +
> > > +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
> > > +{
> > > +	if (adev->irq.installed) {
> > > +		drm_irq_uninstall(&adev->ddev);
> > > +		adev->irq.installed = false;
> > > +		if (adev->irq.msi_enabled)
> > > +			pci_free_irq_vectors(adev->pdev);
> > > +
> > > +		if (!amdgpu_device_has_dc_support(adev))
> > > +			flush_work(&adev->hotplug_work);
> > > +	}
> > > +}
> > > +
> > >   /**
> > >    * amdgpu_irq_fini - shut down interrupt handling
> > >    *
> > > @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
> > >   {
> > >   	unsigned i, j;
> > > -	if (adev->irq.installed) {
> > > -		drm_irq_uninstall(adev_to_drm(adev));
> > > -		adev->irq.installed = false;
> > > -		if (adev->irq.msi_enabled)
> > > -			pci_free_irq_vectors(adev->pdev);
> > > -		if (!amdgpu_device_has_dc_support(adev))
> > > -			flush_work(&adev->hotplug_work);
> > > -	}
> > > -
> > >   	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
> > >   		if (!adev->irq.client[i].sources)
> > >   			continue;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> > > index c718e94..718c70f 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> > > @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
> > >   int amdgpu_irq_init(struct amdgpu_device *adev);
> > >   void amdgpu_irq_fini(struct amdgpu_device *adev);
> > > +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
> > >   int amdgpu_irq_add_id(struct amdgpu_device *adev,
> > >   		      unsigned client_id, unsigned src_id,
> > >   		      struct amdgpu_irq_src *source);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > > index a0af8a7..9e30c5c 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > > @@ -29,6 +29,7 @@
> > >   #include "amdgpu.h"
> > >   #include <drm/drm_debugfs.h>
> > >   #include <drm/amdgpu_drm.h>
> > > +#include <drm/drm_drv.h>
> > >   #include "amdgpu_sched.h"
> > >   #include "amdgpu_uvd.h"
> > >   #include "amdgpu_vce.h"
> > > @@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
> > >   	}
> > >   	amdgpu_acpi_fini(adev);
> > > -	amdgpu_device_fini(adev);
> > > +	amdgpu_device_fini_early(adev);
> > >   }
> > >   void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
> > > @@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
> > >   	pm_runtime_put_autosuspend(dev->dev);
> > >   }
> > > +
> > > +void amdgpu_driver_release_kms(struct drm_device *dev)
> > > +{
> > > +	struct amdgpu_device *adev = drm_to_adev(dev);
> > > +
> > > +	amdgpu_device_fini_late(adev);
> > > +	pci_set_drvdata(adev->pdev, NULL);
> > > +}
> > > +
> > >   /*
> > >    * VBlank related functions.
> > >    */
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > index 9d11b84..caf828a 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > @@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
> > >   {
> > >   	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
> > > +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
> > > +
> > >   	if (!con)
> > >   		return 0;
> > > +
> > >   	/* Need disable ras on all IPs here before ip [hw/sw]fini */
> > >   	amdgpu_ras_disable_all_features(adev, 0);
> > >   	amdgpu_ras_recovery_fini(adev);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > > index 7112137..074f36b 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > > @@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
> > >   };
> > >   int amdgpu_fence_driver_init(struct amdgpu_device *adev);
> > > -void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
> > > +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
> > > +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
> > >   void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
> > >   int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
> > > -- 
> > > 2.7.4
> > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
@ 2020-11-25 10:41         ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25 10:41 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: robh, gregkh, ckoenig.leichtzumerken, dri-devel, eric, ppaalanen,
	amd-gfx, Daniel Vetter, daniel.vetter, Alexander.Deucher, yuq825,
	Harry.Wentland, l.stach

On Tue, Nov 24, 2020 at 10:51:57AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/24/20 9:53 AM, Daniel Vetter wrote:
> > On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
> > > Some of the stuff in amdgpu_device_fini such as HW interrupts
> > > disable and pending fences finilization must be done right away on
> > > pci_remove while most of the stuff which relates to finilizing and
> > > releasing driver data structures can be kept until
> > > drm_driver.release hook is called, i.e. when the last device
> > > reference is dropped.
> > > 
> > Uh fini_late and fini_early are rathare meaningless namings, since no
> > clear why there's a split. If you used drm_connector_funcs as inspiration,
> > that's kinda not good because 'register' itself is a reserved keyword.
> > That's why we had to add late_ prefix, could as well have used
> > C_sucks_ as prefix :-) And then the early_unregister for consistency.
> > 
> > I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
> > about what they're doing.
> > 
> > I still strongly recommend that you cut over as much as possible of the
> > fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
> > -Daniel
> 
> 
> Definitely, and I put it in a TODO list in the RFC patch.Also, as I
> mentioned before -
> I just prefer to leave it for a follow up work because it's non trivial and
> requires shuffling
> a lof of stuff around in the driver. I was thinking of committing the work
> in incremental steps -
> so it's easier to merge it and control for breakages.

Yeah doing devm/drmm conversion later on makes sense. It'd still try to
have better names than what you're currently going with. A few of these
will likely stick around for very long, not just interim.
-Daniel

> 
> Andrey
> 
> 
> > 
> > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
> > >   9 files changed, 65 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > index 83ac06a..6243f6d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
> > >   int amdgpu_device_init(struct amdgpu_device *adev,
> > >   		       uint32_t flags);
> > > -void amdgpu_device_fini(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_late(struct amdgpu_device *adev);
> > > +
> > >   int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
> > >   void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
> > > @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
> > >   int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
> > >   void amdgpu_driver_postclose_kms(struct drm_device *dev,
> > >   				 struct drm_file *file_priv);
> > > +void amdgpu_driver_release_kms(struct drm_device *dev);
> > > +
> > >   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
> > >   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
> > >   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 2f60b70..797d94d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> > >    * Tear down the driver info (all asics).
> > >    * Called at driver shutdown.
> > >    */
> > > -void amdgpu_device_fini(struct amdgpu_device *adev)
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev)
> > >   {
> > >   	dev_info(adev->dev, "amdgpu: finishing device.\n");
> > >   	flush_delayed_work(&adev->delayed_init_work);
> > >   	adev->shutdown = true;
> > > -	kfree(adev->pci_state);
> > > -
> > >   	/* make sure IB test finished before entering exclusive mode
> > >   	 * to avoid preemption on IB test
> > >   	 * */
> > > @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
> > >   		else
> > >   			drm_atomic_helper_shutdown(adev_to_drm(adev));
> > >   	}
> > > -	amdgpu_fence_driver_fini(adev);
> > > +	amdgpu_fence_driver_fini_early(adev);
> > >   	if (adev->pm_sysfs_en)
> > >   		amdgpu_pm_sysfs_fini(adev);
> > >   	amdgpu_fbdev_fini(adev);
> > > +
> > > +	amdgpu_irq_fini_early(adev);
> > > +}
> > > +
> > > +void amdgpu_device_fini_late(struct amdgpu_device *adev)
> > > +{
> > >   	amdgpu_device_ip_fini(adev);
> > > +	amdgpu_fence_driver_fini_late(adev);
> > >   	release_firmware(adev->firmware.gpu_info_fw);
> > >   	adev->firmware.gpu_info_fw = NULL;
> > >   	adev->accel_working = false;
> > > @@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
> > >   		amdgpu_pmu_fini(adev);
> > >   	if (adev->mman.discovery_bin)
> > >   		amdgpu_discovery_fini(adev);
> > > +
> > > +	kfree(adev->pci_state);
> > > +
> > >   }
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index 7f98cf1..3d130fc 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
> > >   {
> > >   	struct drm_device *dev = pci_get_drvdata(pdev);
> > > -#ifdef MODULE
> > > -	if (THIS_MODULE->state != MODULE_STATE_GOING)
> > > -#endif
> > > -		DRM_ERROR("Hotplug removal is not supported\n");
> > >   	drm_dev_unplug(dev);
> > >   	amdgpu_driver_unload_kms(dev);
> > > +
> > >   	pci_disable_device(pdev);
> > > -	pci_set_drvdata(pdev, NULL);
> > >   	drm_dev_put(dev);
> > >   }
> > > @@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
> > >   	.dumb_create = amdgpu_mode_dumb_create,
> > >   	.dumb_map_offset = amdgpu_mode_dumb_mmap,
> > >   	.fops = &amdgpu_driver_kms_fops,
> > > +	.release = &amdgpu_driver_release_kms,
> > >   	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > >   	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > index d0b0021..c123aa6 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > > @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
> > >    *
> > >    * Tear down the fence driver for all possible rings (all asics).
> > >    */
> > > -void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
> > > +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
> > >   {
> > >   	unsigned i, j;
> > >   	int r;
> > > @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
> > >   		if (!ring->no_scheduler)
> > >   			drm_sched_fini(&ring->sched);
> > >   		del_timer_sync(&ring->fence_drv.fallback_timer);
> > > +	}
> > > +}
> > > +
> > > +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
> > > +{
> > > +	unsigned int i, j;
> > > +
> > > +	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
> > > +		struct amdgpu_ring *ring = adev->rings[i];
> > > +
> > > +		if (!ring || !ring->fence_drv.initialized)
> > > +			continue;
> > > +
> > >   		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
> > >   			dma_fence_put(ring->fence_drv.fences[j]);
> > >   		kfree(ring->fence_drv.fences);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > > index 300ac73..a833197 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
> > > @@ -49,6 +49,7 @@
> > >   #include <drm/drm_irq.h>
> > >   #include <drm/drm_vblank.h>
> > >   #include <drm/amdgpu_drm.h>
> > > +#include <drm/drm_drv.h>
> > >   #include "amdgpu.h"
> > >   #include "amdgpu_ih.h"
> > >   #include "atom.h"
> > > @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
> > >   	return 0;
> > >   }
> > > +
> > > +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
> > > +{
> > > +	if (adev->irq.installed) {
> > > +		drm_irq_uninstall(&adev->ddev);
> > > +		adev->irq.installed = false;
> > > +		if (adev->irq.msi_enabled)
> > > +			pci_free_irq_vectors(adev->pdev);
> > > +
> > > +		if (!amdgpu_device_has_dc_support(adev))
> > > +			flush_work(&adev->hotplug_work);
> > > +	}
> > > +}
> > > +
> > >   /**
> > >    * amdgpu_irq_fini - shut down interrupt handling
> > >    *
> > > @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
> > >   {
> > >   	unsigned i, j;
> > > -	if (adev->irq.installed) {
> > > -		drm_irq_uninstall(adev_to_drm(adev));
> > > -		adev->irq.installed = false;
> > > -		if (adev->irq.msi_enabled)
> > > -			pci_free_irq_vectors(adev->pdev);
> > > -		if (!amdgpu_device_has_dc_support(adev))
> > > -			flush_work(&adev->hotplug_work);
> > > -	}
> > > -
> > >   	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
> > >   		if (!adev->irq.client[i].sources)
> > >   			continue;
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> > > index c718e94..718c70f 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
> > > @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
> > >   int amdgpu_irq_init(struct amdgpu_device *adev);
> > >   void amdgpu_irq_fini(struct amdgpu_device *adev);
> > > +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
> > >   int amdgpu_irq_add_id(struct amdgpu_device *adev,
> > >   		      unsigned client_id, unsigned src_id,
> > >   		      struct amdgpu_irq_src *source);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > > index a0af8a7..9e30c5c 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > > @@ -29,6 +29,7 @@
> > >   #include "amdgpu.h"
> > >   #include <drm/drm_debugfs.h>
> > >   #include <drm/amdgpu_drm.h>
> > > +#include <drm/drm_drv.h>
> > >   #include "amdgpu_sched.h"
> > >   #include "amdgpu_uvd.h"
> > >   #include "amdgpu_vce.h"
> > > @@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
> > >   	}
> > >   	amdgpu_acpi_fini(adev);
> > > -	amdgpu_device_fini(adev);
> > > +	amdgpu_device_fini_early(adev);
> > >   }
> > >   void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
> > > @@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
> > >   	pm_runtime_put_autosuspend(dev->dev);
> > >   }
> > > +
> > > +void amdgpu_driver_release_kms(struct drm_device *dev)
> > > +{
> > > +	struct amdgpu_device *adev = drm_to_adev(dev);
> > > +
> > > +	amdgpu_device_fini_late(adev);
> > > +	pci_set_drvdata(adev->pdev, NULL);
> > > +}
> > > +
> > >   /*
> > >    * VBlank related functions.
> > >    */
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > index 9d11b84..caf828a 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > @@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
> > >   {
> > >   	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
> > > +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
> > > +
> > >   	if (!con)
> > >   		return 0;
> > > +
> > >   	/* Need disable ras on all IPs here before ip [hw/sw]fini */
> > >   	amdgpu_ras_disable_all_features(adev, 0);
> > >   	amdgpu_ras_recovery_fini(adev);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > > index 7112137..074f36b 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> > > @@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
> > >   };
> > >   int amdgpu_fence_driver_init(struct amdgpu_device *adev);
> > > -void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
> > > +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
> > > +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
> > >   void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
> > >   int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
> > > -- 
> > > 2.7.4
> > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-21  5:21   ` Andrey Grodzovsky
@ 2020-11-25 10:42     ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-25 10:42 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> It's needed to drop iommu backed pages on device unplug
> before device's IOMMU group is released.

It would be cleaner if we could do the whole handling in TTM. I also 
need to double check what you are doing with this function.

Christian.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 1ccf1ef..29248a5 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>   	else
>   		ttm_pool_unpopulate(ttm);
>   }
> +EXPORT_SYMBOL(ttm_tt_unpopulate);

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-25 10:42     ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-25 10:42 UTC (permalink / raw)
  To: Andrey Grodzovsky, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> It's needed to drop iommu backed pages on device unplug
> before device's IOMMU group is released.

It would be cleaner if we could do the whole handling in TTM. I also 
need to double check what you are doing with this function.

Christian.

>
> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> ---
>   drivers/gpu/drm/ttm/ttm_tt.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index 1ccf1ef..29248a5 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -495,3 +495,4 @@ void ttm_tt_unpopulate(struct ttm_tt *ttm)
>   	else
>   		ttm_pool_unpopulate(ttm);
>   }
> +EXPORT_SYMBOL(ttm_tt_unpopulate);

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-25 10:40                       ` Daniel Vetter
@ 2020-11-25 12:57                         ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-25 12:57 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher, yuq825

Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>> On 11/24/20 2:41 AM, Christian König wrote:
>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>> It would be cleaner if we could do the whole
>>>>>>>>>> handling in TTM. I also need to double check
>>>>>>>>>> what you are doing with this function.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology
>>>>>>>>> notifier per device." to see
>>>>>>>>> how i use it. I don't see why this should go
>>>>>>>>> into TTM mid-layer - the stuff I do inside
>>>>>>>>> is vendor specific and also I don't think TTM is
>>>>>>>>> explicitly aware of IOMMU ?
>>>>>>>>> Do you mean you prefer the IOMMU notifier to be
>>>>>>>>> registered from within TTM
>>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>> No, that is really vendor specific.
>>>>>>>>
>>>>>>>> What I meant is to have a function like
>>>>>>>> ttm_resource_manager_evict_all() which you only need
>>>>>>>> to call and all tt objects are unpopulated.
>>>>>>>
>>>>>>> So instead of this BO list i create and later iterate in
>>>>>>> amdgpu from the IOMMU patch you just want to do it
>>>>>>> within
>>>>>>> TTM with a single function ? Makes much more sense.
>>>>>> Yes, exactly.
>>>>>>
>>>>>> The list_empty() checks we have in TTM for the LRU are
>>>>>> actually not the best idea, we should now check the
>>>>>> pin_count instead. This way we could also have a list of the
>>>>>> pinned BOs in TTM.
>>>>>
>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for
>>>>> the unpinned BOs and this new function for the pinned ones  ?
>>>>> It's probably a good idea to combine both iterations into this
>>>>> new function to cover all the BOs allocated on the device.
>>>> Yes, that's what I had in my mind as well.
>>>>
>>>>>
>>>>>> BTW: Have you thought about what happens when we unpopulate
>>>>>> a BO while we still try to use a kernel mapping for it? That
>>>>>> could have unforeseen consequences.
>>>>>
>>>>> Are you asking what happens to kmap or vmap style mapped CPU
>>>>> accesses once we drop all the DMA backing pages for a particular
>>>>> BO ? Because for user mappings
>>>>> (mmap) we took care of this with dummy page reroute but indeed
>>>>> nothing was done for in kernel CPU mappings.
>>>> Yes exactly that.
>>>>
>>>> In other words what happens if we free the ring buffer while the
>>>> kernel still writes to it?
>>>>
>>>> Christian.
>>>
>>> While we can't control user application accesses to the mapped buffers
>>> explicitly and hence we use page fault rerouting
>>> I am thinking that in this  case we may be able to sprinkle
>>> drm_dev_enter/exit in any such sensitive place were we might
>>> CPU access a DMA buffer from the kernel ?
>> Yes, I fear we are going to need that.
> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> could stuff this into begin/end_cpu_access (but only for the kernel, so a
> bit tricky)?

Oh very very good point! I haven't thought about DMA-buf mmaps in this 
context yet.


> btw the other issue with dma-buf (and even worse with dma_fence) is
> refcounting of the underlying drm_device. I'd expect that all your
> callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> yet solved in your series here.

Well thinking more about this, it seems to be a another really good 
argument why mapping pages from DMA-bufs into application address space 
directly is a very bad idea :)

But yes, we essentially can't remove the device as long as there is a 
DMA-buf with mappings. No idea how to clean that one up.

Christian.

> -Daniel
>
>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ?
>>> Is there other places ?
>> Puh, good question. I have no idea.
>>
>>> Another point is that at this point the driver shouldn't access any such
>>> buffers as we are at the process finishing the device.
>>> AFAIK there is no page fault mechanism for kernel mappings so I don't
>>> think there is anything else to do ?
>> Well there is a page fault handler for kernel mappings, but that one just
>> prints the stack trace into the system log and calls BUG(); :)
>>
>> Long story short we need to avoid any access to released pages after unplug.
>> No matter if it's from the kernel or userspace.
>>
>> Regards,
>> Christian.
>>
>>> Andrey

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-25 12:57                         ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-11-25 12:57 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: robh, Andrey Grodzovsky, daniel.vetter, dri-devel, eric,
	ppaalanen, amd-gfx, gregkh, Alexander.Deucher, l.stach,
	Harry.Wentland, yuq825

Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>> On 11/24/20 2:41 AM, Christian König wrote:
>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>> It would be cleaner if we could do the whole
>>>>>>>>>> handling in TTM. I also need to double check
>>>>>>>>>> what you are doing with this function.
>>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology
>>>>>>>>> notifier per device." to see
>>>>>>>>> how i use it. I don't see why this should go
>>>>>>>>> into TTM mid-layer - the stuff I do inside
>>>>>>>>> is vendor specific and also I don't think TTM is
>>>>>>>>> explicitly aware of IOMMU ?
>>>>>>>>> Do you mean you prefer the IOMMU notifier to be
>>>>>>>>> registered from within TTM
>>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>> No, that is really vendor specific.
>>>>>>>>
>>>>>>>> What I meant is to have a function like
>>>>>>>> ttm_resource_manager_evict_all() which you only need
>>>>>>>> to call and all tt objects are unpopulated.
>>>>>>>
>>>>>>> So instead of this BO list i create and later iterate in
>>>>>>> amdgpu from the IOMMU patch you just want to do it
>>>>>>> within
>>>>>>> TTM with a single function ? Makes much more sense.
>>>>>> Yes, exactly.
>>>>>>
>>>>>> The list_empty() checks we have in TTM for the LRU are
>>>>>> actually not the best idea, we should now check the
>>>>>> pin_count instead. This way we could also have a list of the
>>>>>> pinned BOs in TTM.
>>>>>
>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for
>>>>> the unpinned BOs and this new function for the pinned ones  ?
>>>>> It's probably a good idea to combine both iterations into this
>>>>> new function to cover all the BOs allocated on the device.
>>>> Yes, that's what I had in my mind as well.
>>>>
>>>>>
>>>>>> BTW: Have you thought about what happens when we unpopulate
>>>>>> a BO while we still try to use a kernel mapping for it? That
>>>>>> could have unforeseen consequences.
>>>>>
>>>>> Are you asking what happens to kmap or vmap style mapped CPU
>>>>> accesses once we drop all the DMA backing pages for a particular
>>>>> BO ? Because for user mappings
>>>>> (mmap) we took care of this with dummy page reroute but indeed
>>>>> nothing was done for in kernel CPU mappings.
>>>> Yes exactly that.
>>>>
>>>> In other words what happens if we free the ring buffer while the
>>>> kernel still writes to it?
>>>>
>>>> Christian.
>>>
>>> While we can't control user application accesses to the mapped buffers
>>> explicitly and hence we use page fault rerouting
>>> I am thinking that in this  case we may be able to sprinkle
>>> drm_dev_enter/exit in any such sensitive place were we might
>>> CPU access a DMA buffer from the kernel ?
>> Yes, I fear we are going to need that.
> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> could stuff this into begin/end_cpu_access (but only for the kernel, so a
> bit tricky)?

Oh very very good point! I haven't thought about DMA-buf mmaps in this 
context yet.


> btw the other issue with dma-buf (and even worse with dma_fence) is
> refcounting of the underlying drm_device. I'd expect that all your
> callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> yet solved in your series here.

Well thinking more about this, it seems to be a another really good 
argument why mapping pages from DMA-bufs into application address space 
directly is a very bad idea :)

But yes, we essentially can't remove the device as long as there is a 
DMA-buf with mappings. No idea how to clean that one up.

Christian.

> -Daniel
>
>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ?
>>> Is there other places ?
>> Puh, good question. I have no idea.
>>
>>> Another point is that at this point the driver shouldn't access any such
>>> buffers as we are at the process finishing the device.
>>> AFAIK there is no page fault mechanism for kernel mappings so I don't
>>> think there is anything else to do ?
>> Well there is a page fault handler for kernel mappings, but that one just
>> prints the stack trace into the system log and calls BUG(); :)
>>
>> Long story short we need to avoid any access to released pages after unplug.
>> No matter if it's from the kernel or userspace.
>>
>> Regards,
>> Christian.
>>
>>> Andrey

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-25 12:57                         ` Christian König
@ 2020-11-25 16:36                           ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25 16:36 UTC (permalink / raw)
  To: christian.koenig
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher, yuq825

On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > > > On 11/24/20 2:41 AM, Christian König wrote:
> > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > > > what you are doing with this function.
> > > > > > > > > > > 
> > > > > > > > > > > Christian.
> > > > > > > > > > 
> > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > > > notifier per device." to see
> > > > > > > > > > how i use it. I don't see why this should go
> > > > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > > > registered from within TTM
> > > > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > > > No, that is really vendor specific.
> > > > > > > > > 
> > > > > > > > > What I meant is to have a function like
> > > > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > > > to call and all tt objects are unpopulated.
> > > > > > > > 
> > > > > > > > So instead of this BO list i create and later iterate in
> > > > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > > > within
> > > > > > > > TTM with a single function ? Makes much more sense.
> > > > > > > Yes, exactly.
> > > > > > > 
> > > > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > > > actually not the best idea, we should now check the
> > > > > > > pin_count instead. This way we could also have a list of the
> > > > > > > pinned BOs in TTM.
> > > > > > 
> > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > > > It's probably a good idea to combine both iterations into this
> > > > > > new function to cover all the BOs allocated on the device.
> > > > > Yes, that's what I had in my mind as well.
> > > > > 
> > > > > > 
> > > > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > > > could have unforeseen consequences.
> > > > > > 
> > > > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > > > accesses once we drop all the DMA backing pages for a particular
> > > > > > BO ? Because for user mappings
> > > > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > > > nothing was done for in kernel CPU mappings.
> > > > > Yes exactly that.
> > > > > 
> > > > > In other words what happens if we free the ring buffer while the
> > > > > kernel still writes to it?
> > > > > 
> > > > > Christian.
> > > > 
> > > > While we can't control user application accesses to the mapped buffers
> > > > explicitly and hence we use page fault rerouting
> > > > I am thinking that in this  case we may be able to sprinkle
> > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > CPU access a DMA buffer from the kernel ?
> > > Yes, I fear we are going to need that.
> > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> > could stuff this into begin/end_cpu_access (but only for the kernel, so a
> > bit tricky)?
> 
> Oh very very good point! I haven't thought about DMA-buf mmaps in this
> context yet.
> 
> 
> > btw the other issue with dma-buf (and even worse with dma_fence) is
> > refcounting of the underlying drm_device. I'd expect that all your
> > callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> > yet solved in your series here.
> 
> Well thinking more about this, it seems to be a another really good argument
> why mapping pages from DMA-bufs into application address space directly is a
> very bad idea :)
> 
> But yes, we essentially can't remove the device as long as there is a
> DMA-buf with mappings. No idea how to clean that one up.

drm_dev_get/put in drm_prime helpers should get us like 90% there I think.

The even more worrying thing is random dma_fence attached to the dma_resv
object. We could try to clean all of ours up, but they could have escaped
already into some other driver. And since we're talking about egpu
hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.

I have no how to fix that one :-/
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > > > Is there other places ?
> > > Puh, good question. I have no idea.
> > > 
> > > > Another point is that at this point the driver shouldn't access any such
> > > > buffers as we are at the process finishing the device.
> > > > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > > > think there is anything else to do ?
> > > Well there is a page fault handler for kernel mappings, but that one just
> > > prints the stack trace into the system log and calls BUG(); :)
> > > 
> > > Long story short we need to avoid any access to released pages after unplug.
> > > No matter if it's from the kernel or userspace.
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > Andrey
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-25 16:36                           ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25 16:36 UTC (permalink / raw)
  To: christian.koenig
  Cc: robh, Andrey Grodzovsky, daniel.vetter, dri-devel, eric,
	ppaalanen, amd-gfx, Daniel Vetter, gregkh, Alexander.Deucher,
	l.stach, Harry.Wentland, yuq825

On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > > > On 11/24/20 2:41 AM, Christian König wrote:
> > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > > > what you are doing with this function.
> > > > > > > > > > > 
> > > > > > > > > > > Christian.
> > > > > > > > > > 
> > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > > > notifier per device." to see
> > > > > > > > > > how i use it. I don't see why this should go
> > > > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > > > registered from within TTM
> > > > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > > > No, that is really vendor specific.
> > > > > > > > > 
> > > > > > > > > What I meant is to have a function like
> > > > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > > > to call and all tt objects are unpopulated.
> > > > > > > > 
> > > > > > > > So instead of this BO list i create and later iterate in
> > > > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > > > within
> > > > > > > > TTM with a single function ? Makes much more sense.
> > > > > > > Yes, exactly.
> > > > > > > 
> > > > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > > > actually not the best idea, we should now check the
> > > > > > > pin_count instead. This way we could also have a list of the
> > > > > > > pinned BOs in TTM.
> > > > > > 
> > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > > > It's probably a good idea to combine both iterations into this
> > > > > > new function to cover all the BOs allocated on the device.
> > > > > Yes, that's what I had in my mind as well.
> > > > > 
> > > > > > 
> > > > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > > > could have unforeseen consequences.
> > > > > > 
> > > > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > > > accesses once we drop all the DMA backing pages for a particular
> > > > > > BO ? Because for user mappings
> > > > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > > > nothing was done for in kernel CPU mappings.
> > > > > Yes exactly that.
> > > > > 
> > > > > In other words what happens if we free the ring buffer while the
> > > > > kernel still writes to it?
> > > > > 
> > > > > Christian.
> > > > 
> > > > While we can't control user application accesses to the mapped buffers
> > > > explicitly and hence we use page fault rerouting
> > > > I am thinking that in this  case we may be able to sprinkle
> > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > CPU access a DMA buffer from the kernel ?
> > > Yes, I fear we are going to need that.
> > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> > could stuff this into begin/end_cpu_access (but only for the kernel, so a
> > bit tricky)?
> 
> Oh very very good point! I haven't thought about DMA-buf mmaps in this
> context yet.
> 
> 
> > btw the other issue with dma-buf (and even worse with dma_fence) is
> > refcounting of the underlying drm_device. I'd expect that all your
> > callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> > yet solved in your series here.
> 
> Well thinking more about this, it seems to be a another really good argument
> why mapping pages from DMA-bufs into application address space directly is a
> very bad idea :)
> 
> But yes, we essentially can't remove the device as long as there is a
> DMA-buf with mappings. No idea how to clean that one up.

drm_dev_get/put in drm_prime helpers should get us like 90% there I think.

The even more worrying thing is random dma_fence attached to the dma_resv
object. We could try to clean all of ours up, but they could have escaped
already into some other driver. And since we're talking about egpu
hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.

I have no how to fix that one :-/
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > > > Is there other places ?
> > > Puh, good question. I have no idea.
> > > 
> > > > Another point is that at this point the driver shouldn't access any such
> > > > buffers as we are at the process finishing the device.
> > > > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > > > think there is anything else to do ?
> > > Well there is a page fault handler for kernel mappings, but that one just
> > > prints the stack trace into the system log and calls BUG(); :)
> > > 
> > > Long story short we need to avoid any access to released pages after unplug.
> > > No matter if it's from the kernel or userspace.
> > > 
> > > Regards,
> > > Christian.
> > > 
> > > > Andrey
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-25 12:57                         ` Christian König
@ 2020-11-25 16:56                           ` Michel Dänzer
  -1 siblings, 0 replies; 212+ messages in thread
From: Michel Dänzer @ 2020-11-25 16:56 UTC (permalink / raw)
  To: christian.koenig, Daniel Vetter; +Cc: amd-gfx, dri-devel

On 2020-11-25 1:57 p.m., Christian König wrote:
> 
> Well thinking more about this, it seems to be a another really good 
> argument why mapping pages from DMA-bufs into application address space 
> directly is a very bad idea :)

Apologies for going off on a tangent here...

Since allowing userspace mmap with dma-buf fds seems to be a trap in 
general[0], I wonder if there's any way we could stop supporting that?


[0] E.g. mutter had to disable handing out dma-bufs for screen capture 
by default with non-i915 for now, because in particular with discrete 
GPUs, direct CPU reads can be unusably slow (think single-digit frames 
per second), and of course there's other userspace which goes "ooh, 
dma-buf, let's map and read!".


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-25 16:56                           ` Michel Dänzer
  0 siblings, 0 replies; 212+ messages in thread
From: Michel Dänzer @ 2020-11-25 16:56 UTC (permalink / raw)
  To: christian.koenig, Daniel Vetter; +Cc: amd-gfx, dri-devel

On 2020-11-25 1:57 p.m., Christian König wrote:
> 
> Well thinking more about this, it seems to be a another really good 
> argument why mapping pages from DMA-bufs into application address space 
> directly is a very bad idea :)

Apologies for going off on a tangent here...

Since allowing userspace mmap with dma-buf fds seems to be a trap in 
general[0], I wonder if there's any way we could stop supporting that?


[0] E.g. mutter had to disable handing out dma-bufs for screen capture 
by default with non-i915 for now, because in particular with discrete 
GPUs, direct CPU reads can be unusably slow (think single-digit frames 
per second), and of course there's other userspace which goes "ooh, 
dma-buf, let's map and read!".


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-25 16:56                           ` Michel Dänzer
@ 2020-11-25 17:02                             ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25 17:02 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx list, Christian König, dri-devel

On Wed, Nov 25, 2020 at 5:56 PM Michel Dänzer <michel@daenzer.net> wrote:
>
> On 2020-11-25 1:57 p.m., Christian König wrote:
> >
> > Well thinking more about this, it seems to be a another really good
> > argument why mapping pages from DMA-bufs into application address space
> > directly is a very bad idea :)
>
> Apologies for going off on a tangent here...
>
> Since allowing userspace mmap with dma-buf fds seems to be a trap in
> general[0], I wonder if there's any way we could stop supporting that?
>
>
> [0] E.g. mutter had to disable handing out dma-bufs for screen capture
> by default with non-i915 for now, because in particular with discrete
> GPUs, direct CPU reads can be unusably slow (think single-digit frames
> per second), and of course there's other userspace which goes "ooh,
> dma-buf, let's map and read!".

I think a pile of applications (cros included) use it to do uploads
across process boundaries. Think locked down jpeg decoder and stuff
like that. For that use-case it seems to work ok.

But yeah don't read from dma-buf. I'm pretty sure it's dead slow on
almost everything, except integrated gpu which have A) a coherent
fabric with the gpu and B) that fabric is actually faster for
rendering in general, not just for dedicated buffers allocated for
down/upload.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-25 17:02                             ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-25 17:02 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx list, Christian König, dri-devel

On Wed, Nov 25, 2020 at 5:56 PM Michel Dänzer <michel@daenzer.net> wrote:
>
> On 2020-11-25 1:57 p.m., Christian König wrote:
> >
> > Well thinking more about this, it seems to be a another really good
> > argument why mapping pages from DMA-bufs into application address space
> > directly is a very bad idea :)
>
> Apologies for going off on a tangent here...
>
> Since allowing userspace mmap with dma-buf fds seems to be a trap in
> general[0], I wonder if there's any way we could stop supporting that?
>
>
> [0] E.g. mutter had to disable handing out dma-bufs for screen capture
> by default with non-i915 for now, because in particular with discrete
> GPUs, direct CPU reads can be unusably slow (think single-digit frames
> per second), and of course there's other userspace which goes "ooh,
> dma-buf, let's map and read!".

I think a pile of applications (cros included) use it to do uploads
across process boundaries. Think locked down jpeg decoder and stuff
like that. For that use-case it seems to work ok.

But yeah don't read from dma-buf. I'm pretty sure it's dead slow on
almost everything, except integrated gpu which have A) a coherent
fabric with the gpu and B) that fabric is actually faster for
rendering in general, not just for dedicated buffers allocated for
down/upload.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
  2020-11-25  9:04         ` Daniel Vetter
@ 2020-11-25 17:39           ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-25 17:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: amd-gfx list, Christian König, dri-devel, Qiang Yu, Greg KH,
	Alex Deucher


[-- Attachment #1.1: Type: text/plain, Size: 5644 bytes --]


On 11/25/20 4:04 AM, Daniel Vetter wrote:
> On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> On 11/24/20 9:49 AM, Daniel Vetter wrote:
>>> On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
>>>> Avoids NULL ptr due to kobj->sd being unset on device removal.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
>>>>    2 files changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> index caf828a..812e592 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> @@ -27,6 +27,7 @@
>>>>    #include <linux/uaccess.h>
>>>>    #include <linux/reboot.h>
>>>>    #include <linux/syscalls.h>
>>>> +#include <drm/drm_drv.h>
>>>>
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_ras.h"
>>>> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
>>>>               .attrs = attrs,
>>>>       };
>>>>
>>>> -    sysfs_remove_group(&adev->dev->kobj, &group);
>>>> +    if (!drm_dev_is_unplugged(&adev->ddev))
>>>> +            sysfs_remove_group(&adev->dev->kobj, &group);
>>> This looks wrong. sysfs, like any other interface, should be
>>> unconditionally thrown out when we do the drm_dev_unregister. Whether
>>> hotunplugged or not should matter at all. Either this isn't needed at all,
>>> or something is wrong with the ordering here. But definitely fishy.
>>> -Daniel
>>
>> So technically this is needed because kobejct's sysfs directory entry kobj->sd
>> is set to NULL
>> on device removal (from sysfs_remove_dir) but because we don't finalize the device
>> until last reference to drm file is dropped (which can happen later) we end up
>> calling sysfs_remove_file/dir after
>> this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
>> sysfs_remove_dir
>> is not and that why I guard against calls to sysfs_remove_dir.
>> But indeed the whole approach in the driver is incorrect, as Greg pointed out -
>> we should use
>> default groups attributes instead of explicit calls to sysfs interface and this
>> would save those troubles.
>> But again. the issue here of scope of work, converting all of amdgpu to default
>> groups attributes is somewhat
>> lengthy process with extra testing as the entire driver is papered with sysfs
>> references and seems to me more of a standalone
>> cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
>> that it makes more sense
>> to finalize and push the hot unplug patches so that this new functionality can
>> be part of the driver sooner
>> and then incrementally improve it by working on those other topics. Just as
>> devm_/drmm_ I also added sysfs cleanup
>> to my TODO list in the RFC patch.
> Hm, whether you solve this with the default group stuff to
> auto-remove, or remove explicitly at the right time doesn't matter
> much. The underlying problem you have here is that it's done way too
> late.

As far as I understood correctly the default group attrs by reading this
article by Greg - https://www.linux.com/news/how-create-sysfs-file-correctly/
it will be removed together with the device and not too late like now and I quote
from the last paragraph there:

"By setting this value, you don’t have to do anything in your
probe() or release() functions at all in order for the
sysfs files to be properly created and destroyed whenever your
device is added or removed from the system. And you will, most
importantly, do it in a race-free manner, which is always a good thing."

To me this seems like the best solution to the late remove issue. What do
you think ?


>   sysfs removal (like all uapi interfaces) need to be removed as
> part of drm_dev_unregister.


Do you mean we need to trace and aggregate all sysfs files creation within
the low level drivers and then call some sysfs release function inside 
drm_dev_unregister
to iterate and release them all ?


>   I guess aside from the split into fini_hw
> and fini_sw, you also need an unregister_late callback (like we have
> already for drm_connector, so that e.g. backlight and similar stuff
> can be unregistered).


Is this the callback you suggest to call from within drm_dev_unregister and
it will be responsible to release all sysfs files created within the driver ?

Andrey


>
> Papering over the underlying bug like this doesn't really fix much,
> the lifetimes are still wrong.
> -Daniel
>
>> Andrey
>>
>>
>>>>       return 0;
>>>>    }
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>> index 2b7c90b..54331fc 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>> @@ -24,6 +24,7 @@
>>>>    #include <linux/firmware.h>
>>>>    #include <linux/slab.h>
>>>>    #include <linux/module.h>
>>>> +#include <drm/drm_drv.h>
>>>>
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_ucode.h"
>>>> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
>>>>
>>>>    void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
>>>>    {
>>>> -    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>>> +    if (!drm_dev_is_unplugged(&adev->ddev))
>>>> +            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>>>    }
>>>>
>>>>    static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
>>>> --
>>>> 2.7.4
>>>>
>
>

[-- Attachment #1.2: Type: text/html, Size: 16193 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
@ 2020-11-25 17:39           ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-25 17:39 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Herring, amd-gfx list, Christian König, dri-devel,
	Anholt, Eric, Pekka Paalanen, Qiang Yu, Greg KH, Alex Deucher,
	Wentland, Harry, Lucas Stach


[-- Attachment #1.1: Type: text/plain, Size: 5644 bytes --]


On 11/25/20 4:04 AM, Daniel Vetter wrote:
> On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> On 11/24/20 9:49 AM, Daniel Vetter wrote:
>>> On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
>>>> Avoids NULL ptr due to kobj->sd being unset on device removal.
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
>>>>    2 files changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> index caf828a..812e592 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> @@ -27,6 +27,7 @@
>>>>    #include <linux/uaccess.h>
>>>>    #include <linux/reboot.h>
>>>>    #include <linux/syscalls.h>
>>>> +#include <drm/drm_drv.h>
>>>>
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_ras.h"
>>>> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
>>>>               .attrs = attrs,
>>>>       };
>>>>
>>>> -    sysfs_remove_group(&adev->dev->kobj, &group);
>>>> +    if (!drm_dev_is_unplugged(&adev->ddev))
>>>> +            sysfs_remove_group(&adev->dev->kobj, &group);
>>> This looks wrong. sysfs, like any other interface, should be
>>> unconditionally thrown out when we do the drm_dev_unregister. Whether
>>> hotunplugged or not should matter at all. Either this isn't needed at all,
>>> or something is wrong with the ordering here. But definitely fishy.
>>> -Daniel
>>
>> So technically this is needed because kobejct's sysfs directory entry kobj->sd
>> is set to NULL
>> on device removal (from sysfs_remove_dir) but because we don't finalize the device
>> until last reference to drm file is dropped (which can happen later) we end up
>> calling sysfs_remove_file/dir after
>> this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
>> sysfs_remove_dir
>> is not and that why I guard against calls to sysfs_remove_dir.
>> But indeed the whole approach in the driver is incorrect, as Greg pointed out -
>> we should use
>> default groups attributes instead of explicit calls to sysfs interface and this
>> would save those troubles.
>> But again. the issue here of scope of work, converting all of amdgpu to default
>> groups attributes is somewhat
>> lengthy process with extra testing as the entire driver is papered with sysfs
>> references and seems to me more of a standalone
>> cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
>> that it makes more sense
>> to finalize and push the hot unplug patches so that this new functionality can
>> be part of the driver sooner
>> and then incrementally improve it by working on those other topics. Just as
>> devm_/drmm_ I also added sysfs cleanup
>> to my TODO list in the RFC patch.
> Hm, whether you solve this with the default group stuff to
> auto-remove, or remove explicitly at the right time doesn't matter
> much. The underlying problem you have here is that it's done way too
> late.

As far as I understood correctly the default group attrs by reading this
article by Greg - https://www.linux.com/news/how-create-sysfs-file-correctly/
it will be removed together with the device and not too late like now and I quote
from the last paragraph there:

"By setting this value, you don’t have to do anything in your
probe() or release() functions at all in order for the
sysfs files to be properly created and destroyed whenever your
device is added or removed from the system. And you will, most
importantly, do it in a race-free manner, which is always a good thing."

To me this seems like the best solution to the late remove issue. What do
you think ?


>   sysfs removal (like all uapi interfaces) need to be removed as
> part of drm_dev_unregister.


Do you mean we need to trace and aggregate all sysfs files creation within
the low level drivers and then call some sysfs release function inside 
drm_dev_unregister
to iterate and release them all ?


>   I guess aside from the split into fini_hw
> and fini_sw, you also need an unregister_late callback (like we have
> already for drm_connector, so that e.g. backlight and similar stuff
> can be unregistered).


Is this the callback you suggest to call from within drm_dev_unregister and
it will be responsible to release all sysfs files created within the driver ?

Andrey


>
> Papering over the underlying bug like this doesn't really fix much,
> the lifetimes are still wrong.
> -Daniel
>
>> Andrey
>>
>>
>>>>       return 0;
>>>>    }
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>> index 2b7c90b..54331fc 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>> @@ -24,6 +24,7 @@
>>>>    #include <linux/firmware.h>
>>>>    #include <linux/slab.h>
>>>>    #include <linux/module.h>
>>>> +#include <drm/drm_drv.h>
>>>>
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_ucode.h"
>>>> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
>>>>
>>>>    void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
>>>>    {
>>>> -    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>>> +    if (!drm_dev_is_unplugged(&adev->ddev))
>>>> +            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>>>    }
>>>>
>>>>    static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
>>>> --
>>>> 2.7.4
>>>>
>
>

[-- Attachment #1.2: Type: text/html, Size: 16193 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
  2020-11-25 10:41         ` Daniel Vetter
@ 2020-11-25 17:41           ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-25 17:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: gregkh, ckoenig.leichtzumerken, dri-devel, amd-gfx,
	daniel.vetter, Alexander.Deucher, yuq825


On 11/25/20 5:41 AM, Daniel Vetter wrote:
> On Tue, Nov 24, 2020 at 10:51:57AM -0500, Andrey Grodzovsky wrote:
>> On 11/24/20 9:53 AM, Daniel Vetter wrote:
>>> On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
>>>> Some of the stuff in amdgpu_device_fini such as HW interrupts
>>>> disable and pending fences finilization must be done right away on
>>>> pci_remove while most of the stuff which relates to finilizing and
>>>> releasing driver data structures can be kept until
>>>> drm_driver.release hook is called, i.e. when the last device
>>>> reference is dropped.
>>>>
>>> Uh fini_late and fini_early are rathare meaningless namings, since no
>>> clear why there's a split. If you used drm_connector_funcs as inspiration,
>>> that's kinda not good because 'register' itself is a reserved keyword.
>>> That's why we had to add late_ prefix, could as well have used
>>> C_sucks_ as prefix :-) And then the early_unregister for consistency.
>>>
>>> I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
>>> about what they're doing.
>>>
>>> I still strongly recommend that you cut over as much as possible of the
>>> fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
>>> -Daniel
>>
>> Definitely, and I put it in a TODO list in the RFC patch.Also, as I
>> mentioned before -
>> I just prefer to leave it for a follow up work because it's non trivial and
>> requires shuffling
>> a lof of stuff around in the driver. I was thinking of committing the work
>> in incremental steps -
>> so it's easier to merge it and control for breakages.
> Yeah doing devm/drmm conversion later on makes sense. It'd still try to
> have better names than what you're currently going with. A few of these
> will likely stick around for very long, not just interim.
> -Daniel

Will do.

Andrey


>
>> Andrey
>>
>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
>>>>    9 files changed, 65 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index 83ac06a..6243f6d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
>>>>    int amdgpu_device_init(struct amdgpu_device *adev,
>>>>    		       uint32_t flags);
>>>> -void amdgpu_device_fini(struct amdgpu_device *adev);
>>>> +void amdgpu_device_fini_early(struct amdgpu_device *adev);
>>>> +void amdgpu_device_fini_late(struct amdgpu_device *adev);
>>>> +
>>>>    int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
>>>>    void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
>>>> @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
>>>>    int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
>>>>    void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>>>    				 struct drm_file *file_priv);
>>>> +void amdgpu_driver_release_kms(struct drm_device *dev);
>>>> +
>>>>    int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>>>>    int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>>>>    int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 2f60b70..797d94d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>>     * Tear down the driver info (all asics).
>>>>     * Called at driver shutdown.
>>>>     */
>>>> -void amdgpu_device_fini(struct amdgpu_device *adev)
>>>> +void amdgpu_device_fini_early(struct amdgpu_device *adev)
>>>>    {
>>>>    	dev_info(adev->dev, "amdgpu: finishing device.\n");
>>>>    	flush_delayed_work(&adev->delayed_init_work);
>>>>    	adev->shutdown = true;
>>>> -	kfree(adev->pci_state);
>>>> -
>>>>    	/* make sure IB test finished before entering exclusive mode
>>>>    	 * to avoid preemption on IB test
>>>>    	 * */
>>>> @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>>>    		else
>>>>    			drm_atomic_helper_shutdown(adev_to_drm(adev));
>>>>    	}
>>>> -	amdgpu_fence_driver_fini(adev);
>>>> +	amdgpu_fence_driver_fini_early(adev);
>>>>    	if (adev->pm_sysfs_en)
>>>>    		amdgpu_pm_sysfs_fini(adev);
>>>>    	amdgpu_fbdev_fini(adev);
>>>> +
>>>> +	amdgpu_irq_fini_early(adev);
>>>> +}
>>>> +
>>>> +void amdgpu_device_fini_late(struct amdgpu_device *adev)
>>>> +{
>>>>    	amdgpu_device_ip_fini(adev);
>>>> +	amdgpu_fence_driver_fini_late(adev);
>>>>    	release_firmware(adev->firmware.gpu_info_fw);
>>>>    	adev->firmware.gpu_info_fw = NULL;
>>>>    	adev->accel_working = false;
>>>> @@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>>>    		amdgpu_pmu_fini(adev);
>>>>    	if (adev->mman.discovery_bin)
>>>>    		amdgpu_discovery_fini(adev);
>>>> +
>>>> +	kfree(adev->pci_state);
>>>> +
>>>>    }
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> index 7f98cf1..3d130fc 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> @@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>>>>    {
>>>>    	struct drm_device *dev = pci_get_drvdata(pdev);
>>>> -#ifdef MODULE
>>>> -	if (THIS_MODULE->state != MODULE_STATE_GOING)
>>>> -#endif
>>>> -		DRM_ERROR("Hotplug removal is not supported\n");
>>>>    	drm_dev_unplug(dev);
>>>>    	amdgpu_driver_unload_kms(dev);
>>>> +
>>>>    	pci_disable_device(pdev);
>>>> -	pci_set_drvdata(pdev, NULL);
>>>>    	drm_dev_put(dev);
>>>>    }
>>>> @@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
>>>>    	.dumb_create = amdgpu_mode_dumb_create,
>>>>    	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>>>>    	.fops = &amdgpu_driver_kms_fops,
>>>> +	.release = &amdgpu_driver_release_kms,
>>>>    	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>>>>    	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index d0b0021..c123aa6 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>>>>     *
>>>>     * Tear down the fence driver for all possible rings (all asics).
>>>>     */
>>>> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>>> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
>>>>    {
>>>>    	unsigned i, j;
>>>>    	int r;
>>>> @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>>>    		if (!ring->no_scheduler)
>>>>    			drm_sched_fini(&ring->sched);
>>>>    		del_timer_sync(&ring->fence_drv.fallback_timer);
>>>> +	}
>>>> +}
>>>> +
>>>> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
>>>> +{
>>>> +	unsigned int i, j;
>>>> +
>>>> +	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
>>>> +		struct amdgpu_ring *ring = adev->rings[i];
>>>> +
>>>> +		if (!ring || !ring->fence_drv.initialized)
>>>> +			continue;
>>>> +
>>>>    		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>>>    			dma_fence_put(ring->fence_drv.fences[j]);
>>>>    		kfree(ring->fence_drv.fences);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> index 300ac73..a833197 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> @@ -49,6 +49,7 @@
>>>>    #include <drm/drm_irq.h>
>>>>    #include <drm/drm_vblank.h>
>>>>    #include <drm/amdgpu_drm.h>
>>>> +#include <drm/drm_drv.h>
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_ih.h"
>>>>    #include "atom.h"
>>>> @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>>>    	return 0;
>>>>    }
>>>> +
>>>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
>>>> +{
>>>> +	if (adev->irq.installed) {
>>>> +		drm_irq_uninstall(&adev->ddev);
>>>> +		adev->irq.installed = false;
>>>> +		if (adev->irq.msi_enabled)
>>>> +			pci_free_irq_vectors(adev->pdev);
>>>> +
>>>> +		if (!amdgpu_device_has_dc_support(adev))
>>>> +			flush_work(&adev->hotplug_work);
>>>> +	}
>>>> +}
>>>> +
>>>>    /**
>>>>     * amdgpu_irq_fini - shut down interrupt handling
>>>>     *
>>>> @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>>>>    {
>>>>    	unsigned i, j;
>>>> -	if (adev->irq.installed) {
>>>> -		drm_irq_uninstall(adev_to_drm(adev));
>>>> -		adev->irq.installed = false;
>>>> -		if (adev->irq.msi_enabled)
>>>> -			pci_free_irq_vectors(adev->pdev);
>>>> -		if (!amdgpu_device_has_dc_support(adev))
>>>> -			flush_work(&adev->hotplug_work);
>>>> -	}
>>>> -
>>>>    	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>>>>    		if (!adev->irq.client[i].sources)
>>>>    			continue;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>>>> index c718e94..718c70f 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>>>> @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
>>>>    int amdgpu_irq_init(struct amdgpu_device *adev);
>>>>    void amdgpu_irq_fini(struct amdgpu_device *adev);
>>>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
>>>>    int amdgpu_irq_add_id(struct amdgpu_device *adev,
>>>>    		      unsigned client_id, unsigned src_id,
>>>>    		      struct amdgpu_irq_src *source);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> index a0af8a7..9e30c5c 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> @@ -29,6 +29,7 @@
>>>>    #include "amdgpu.h"
>>>>    #include <drm/drm_debugfs.h>
>>>>    #include <drm/amdgpu_drm.h>
>>>> +#include <drm/drm_drv.h>
>>>>    #include "amdgpu_sched.h"
>>>>    #include "amdgpu_uvd.h"
>>>>    #include "amdgpu_vce.h"
>>>> @@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>>>    	}
>>>>    	amdgpu_acpi_fini(adev);
>>>> -	amdgpu_device_fini(adev);
>>>> +	amdgpu_device_fini_early(adev);
>>>>    }
>>>>    void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
>>>> @@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>>>    	pm_runtime_put_autosuspend(dev->dev);
>>>>    }
>>>> +
>>>> +void amdgpu_driver_release_kms(struct drm_device *dev)
>>>> +{
>>>> +	struct amdgpu_device *adev = drm_to_adev(dev);
>>>> +
>>>> +	amdgpu_device_fini_late(adev);
>>>> +	pci_set_drvdata(adev->pdev, NULL);
>>>> +}
>>>> +
>>>>    /*
>>>>     * VBlank related functions.
>>>>     */
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> index 9d11b84..caf828a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> @@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
>>>>    {
>>>>    	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>>>> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
>>>> +
>>>>    	if (!con)
>>>>    		return 0;
>>>> +
>>>>    	/* Need disable ras on all IPs here before ip [hw/sw]fini */
>>>>    	amdgpu_ras_disable_all_features(adev, 0);
>>>>    	amdgpu_ras_recovery_fini(adev);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> index 7112137..074f36b 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> @@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
>>>>    };
>>>>    int amdgpu_fence_driver_init(struct amdgpu_device *adev);
>>>> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
>>>> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
>>>> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
>>>>    void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
>>>>    int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>>>> -- 
>>>> 2.7.4
>>>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late
@ 2020-11-25 17:41           ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-25 17:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, gregkh, ckoenig.leichtzumerken, dri-devel, eric, ppaalanen,
	amd-gfx, daniel.vetter, Alexander.Deucher, yuq825,
	Harry.Wentland, l.stach


On 11/25/20 5:41 AM, Daniel Vetter wrote:
> On Tue, Nov 24, 2020 at 10:51:57AM -0500, Andrey Grodzovsky wrote:
>> On 11/24/20 9:53 AM, Daniel Vetter wrote:
>>> On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
>>>> Some of the stuff in amdgpu_device_fini such as HW interrupts
>>>> disable and pending fences finilization must be done right away on
>>>> pci_remove while most of the stuff which relates to finilizing and
>>>> releasing driver data structures can be kept until
>>>> drm_driver.release hook is called, i.e. when the last device
>>>> reference is dropped.
>>>>
>>> Uh fini_late and fini_early are rathare meaningless namings, since no
>>> clear why there's a split. If you used drm_connector_funcs as inspiration,
>>> that's kinda not good because 'register' itself is a reserved keyword.
>>> That's why we had to add late_ prefix, could as well have used
>>> C_sucks_ as prefix :-) And then the early_unregister for consistency.
>>>
>>> I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
>>> about what they're doing.
>>>
>>> I still strongly recommend that you cut over as much as possible of the
>>> fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
>>> -Daniel
>>
>> Definitely, and I put it in a TODO list in the RFC patch.Also, as I
>> mentioned before -
>> I just prefer to leave it for a follow up work because it's non trivial and
>> requires shuffling
>> a lof of stuff around in the driver. I was thinking of committing the work
>> in incremental steps -
>> so it's easier to merge it and control for breakages.
> Yeah doing devm/drmm conversion later on makes sense. It'd still try to
> have better names than what you're currently going with. A few of these
> will likely stick around for very long, not just interim.
> -Daniel

Will do.

Andrey


>
>> Andrey
>>
>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  6 +++++-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 ++++++++++++----
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  7 ++-----
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++++++++++++++-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    | 24 +++++++++++++++---------
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h    |  1 +
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    | 12 +++++++++++-
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c    |  3 +++
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
>>>>    9 files changed, 65 insertions(+), 22 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index 83ac06a..6243f6d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
>>>>    int amdgpu_device_init(struct amdgpu_device *adev,
>>>>    		       uint32_t flags);
>>>> -void amdgpu_device_fini(struct amdgpu_device *adev);
>>>> +void amdgpu_device_fini_early(struct amdgpu_device *adev);
>>>> +void amdgpu_device_fini_late(struct amdgpu_device *adev);
>>>> +
>>>>    int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
>>>>    void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
>>>> @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
>>>>    int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
>>>>    void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>>>    				 struct drm_file *file_priv);
>>>> +void amdgpu_driver_release_kms(struct drm_device *dev);
>>>> +
>>>>    int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
>>>>    int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
>>>>    int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 2f60b70..797d94d 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>>     * Tear down the driver info (all asics).
>>>>     * Called at driver shutdown.
>>>>     */
>>>> -void amdgpu_device_fini(struct amdgpu_device *adev)
>>>> +void amdgpu_device_fini_early(struct amdgpu_device *adev)
>>>>    {
>>>>    	dev_info(adev->dev, "amdgpu: finishing device.\n");
>>>>    	flush_delayed_work(&adev->delayed_init_work);
>>>>    	adev->shutdown = true;
>>>> -	kfree(adev->pci_state);
>>>> -
>>>>    	/* make sure IB test finished before entering exclusive mode
>>>>    	 * to avoid preemption on IB test
>>>>    	 * */
>>>> @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>>>    		else
>>>>    			drm_atomic_helper_shutdown(adev_to_drm(adev));
>>>>    	}
>>>> -	amdgpu_fence_driver_fini(adev);
>>>> +	amdgpu_fence_driver_fini_early(adev);
>>>>    	if (adev->pm_sysfs_en)
>>>>    		amdgpu_pm_sysfs_fini(adev);
>>>>    	amdgpu_fbdev_fini(adev);
>>>> +
>>>> +	amdgpu_irq_fini_early(adev);
>>>> +}
>>>> +
>>>> +void amdgpu_device_fini_late(struct amdgpu_device *adev)
>>>> +{
>>>>    	amdgpu_device_ip_fini(adev);
>>>> +	amdgpu_fence_driver_fini_late(adev);
>>>>    	release_firmware(adev->firmware.gpu_info_fw);
>>>>    	adev->firmware.gpu_info_fw = NULL;
>>>>    	adev->accel_working = false;
>>>> @@ -3621,6 +3626,9 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
>>>>    		amdgpu_pmu_fini(adev);
>>>>    	if (adev->mman.discovery_bin)
>>>>    		amdgpu_discovery_fini(adev);
>>>> +
>>>> +	kfree(adev->pci_state);
>>>> +
>>>>    }
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> index 7f98cf1..3d130fc 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>>>> @@ -1244,14 +1244,10 @@ amdgpu_pci_remove(struct pci_dev *pdev)
>>>>    {
>>>>    	struct drm_device *dev = pci_get_drvdata(pdev);
>>>> -#ifdef MODULE
>>>> -	if (THIS_MODULE->state != MODULE_STATE_GOING)
>>>> -#endif
>>>> -		DRM_ERROR("Hotplug removal is not supported\n");
>>>>    	drm_dev_unplug(dev);
>>>>    	amdgpu_driver_unload_kms(dev);
>>>> +
>>>>    	pci_disable_device(pdev);
>>>> -	pci_set_drvdata(pdev, NULL);
>>>>    	drm_dev_put(dev);
>>>>    }
>>>> @@ -1557,6 +1553,7 @@ static struct drm_driver kms_driver = {
>>>>    	.dumb_create = amdgpu_mode_dumb_create,
>>>>    	.dumb_map_offset = amdgpu_mode_dumb_mmap,
>>>>    	.fops = &amdgpu_driver_kms_fops,
>>>> +	.release = &amdgpu_driver_release_kms,
>>>>    	.prime_handle_to_fd = drm_gem_prime_handle_to_fd,
>>>>    	.prime_fd_to_handle = drm_gem_prime_fd_to_handle,
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index d0b0021..c123aa6 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -523,7 +523,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
>>>>     *
>>>>     * Tear down the fence driver for all possible rings (all asics).
>>>>     */
>>>> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>>> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev)
>>>>    {
>>>>    	unsigned i, j;
>>>>    	int r;
>>>> @@ -544,6 +544,19 @@ void amdgpu_fence_driver_fini(struct amdgpu_device *adev)
>>>>    		if (!ring->no_scheduler)
>>>>    			drm_sched_fini(&ring->sched);
>>>>    		del_timer_sync(&ring->fence_drv.fallback_timer);
>>>> +	}
>>>> +}
>>>> +
>>>> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev)
>>>> +{
>>>> +	unsigned int i, j;
>>>> +
>>>> +	for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
>>>> +		struct amdgpu_ring *ring = adev->rings[i];
>>>> +
>>>> +		if (!ring || !ring->fence_drv.initialized)
>>>> +			continue;
>>>> +
>>>>    		for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
>>>>    			dma_fence_put(ring->fence_drv.fences[j]);
>>>>    		kfree(ring->fence_drv.fences);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> index 300ac73..a833197 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>>> @@ -49,6 +49,7 @@
>>>>    #include <drm/drm_irq.h>
>>>>    #include <drm/drm_vblank.h>
>>>>    #include <drm/amdgpu_drm.h>
>>>> +#include <drm/drm_drv.h>
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_ih.h"
>>>>    #include "atom.h"
>>>> @@ -297,6 +298,20 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>>>    	return 0;
>>>>    }
>>>> +
>>>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev)
>>>> +{
>>>> +	if (adev->irq.installed) {
>>>> +		drm_irq_uninstall(&adev->ddev);
>>>> +		adev->irq.installed = false;
>>>> +		if (adev->irq.msi_enabled)
>>>> +			pci_free_irq_vectors(adev->pdev);
>>>> +
>>>> +		if (!amdgpu_device_has_dc_support(adev))
>>>> +			flush_work(&adev->hotplug_work);
>>>> +	}
>>>> +}
>>>> +
>>>>    /**
>>>>     * amdgpu_irq_fini - shut down interrupt handling
>>>>     *
>>>> @@ -310,15 +325,6 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>>>>    {
>>>>    	unsigned i, j;
>>>> -	if (adev->irq.installed) {
>>>> -		drm_irq_uninstall(adev_to_drm(adev));
>>>> -		adev->irq.installed = false;
>>>> -		if (adev->irq.msi_enabled)
>>>> -			pci_free_irq_vectors(adev->pdev);
>>>> -		if (!amdgpu_device_has_dc_support(adev))
>>>> -			flush_work(&adev->hotplug_work);
>>>> -	}
>>>> -
>>>>    	for (i = 0; i < AMDGPU_IRQ_CLIENTID_MAX; ++i) {
>>>>    		if (!adev->irq.client[i].sources)
>>>>    			continue;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>>>> index c718e94..718c70f 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h
>>>> @@ -104,6 +104,7 @@ irqreturn_t amdgpu_irq_handler(int irq, void *arg);
>>>>    int amdgpu_irq_init(struct amdgpu_device *adev);
>>>>    void amdgpu_irq_fini(struct amdgpu_device *adev);
>>>> +void amdgpu_irq_fini_early(struct amdgpu_device *adev);
>>>>    int amdgpu_irq_add_id(struct amdgpu_device *adev,
>>>>    		      unsigned client_id, unsigned src_id,
>>>>    		      struct amdgpu_irq_src *source);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> index a0af8a7..9e30c5c 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> @@ -29,6 +29,7 @@
>>>>    #include "amdgpu.h"
>>>>    #include <drm/drm_debugfs.h>
>>>>    #include <drm/amdgpu_drm.h>
>>>> +#include <drm/drm_drv.h>
>>>>    #include "amdgpu_sched.h"
>>>>    #include "amdgpu_uvd.h"
>>>>    #include "amdgpu_vce.h"
>>>> @@ -94,7 +95,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>>>    	}
>>>>    	amdgpu_acpi_fini(adev);
>>>> -	amdgpu_device_fini(adev);
>>>> +	amdgpu_device_fini_early(adev);
>>>>    }
>>>>    void amdgpu_register_gpu_instance(struct amdgpu_device *adev)
>>>> @@ -1147,6 +1148,15 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
>>>>    	pm_runtime_put_autosuspend(dev->dev);
>>>>    }
>>>> +
>>>> +void amdgpu_driver_release_kms(struct drm_device *dev)
>>>> +{
>>>> +	struct amdgpu_device *adev = drm_to_adev(dev);
>>>> +
>>>> +	amdgpu_device_fini_late(adev);
>>>> +	pci_set_drvdata(adev->pdev, NULL);
>>>> +}
>>>> +
>>>>    /*
>>>>     * VBlank related functions.
>>>>     */
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> index 9d11b84..caf828a 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>> @@ -2142,9 +2142,12 @@ int amdgpu_ras_pre_fini(struct amdgpu_device *adev)
>>>>    {
>>>>    	struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>>>> +	//DRM_ERROR("adev 0x%llx", (long long unsigned int)adev);
>>>> +
>>>>    	if (!con)
>>>>    		return 0;
>>>> +
>>>>    	/* Need disable ras on all IPs here before ip [hw/sw]fini */
>>>>    	amdgpu_ras_disable_all_features(adev, 0);
>>>>    	amdgpu_ras_recovery_fini(adev);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> index 7112137..074f36b 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> @@ -107,7 +107,8 @@ struct amdgpu_fence_driver {
>>>>    };
>>>>    int amdgpu_fence_driver_init(struct amdgpu_device *adev);
>>>> -void amdgpu_fence_driver_fini(struct amdgpu_device *adev);
>>>> +void amdgpu_fence_driver_fini_early(struct amdgpu_device *adev);
>>>> +void amdgpu_fence_driver_fini_late(struct amdgpu_device *adev);
>>>>    void amdgpu_fence_driver_force_completion(struct amdgpu_ring *ring);
>>>>    int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
>>>> -- 
>>>> 2.7.4
>>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-25 16:36                           ` Daniel Vetter
@ 2020-11-25 19:34                             ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-25 19:34 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Alexander.Deucher, yuq825


On 11/25/20 11:36 AM, Daniel Vetter wrote:
> On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
>> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
>>> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
>>>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>>>> On 11/24/20 2:41 AM, Christian König wrote:
>>>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>>>> It would be cleaner if we could do the whole
>>>>>>>>>>>> handling in TTM. I also need to double check
>>>>>>>>>>>> what you are doing with this function.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology
>>>>>>>>>>> notifier per device." to see
>>>>>>>>>>> how i use it. I don't see why this should go
>>>>>>>>>>> into TTM mid-layer - the stuff I do inside
>>>>>>>>>>> is vendor specific and also I don't think TTM is
>>>>>>>>>>> explicitly aware of IOMMU ?
>>>>>>>>>>> Do you mean you prefer the IOMMU notifier to be
>>>>>>>>>>> registered from within TTM
>>>>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>>>> No, that is really vendor specific.
>>>>>>>>>>
>>>>>>>>>> What I meant is to have a function like
>>>>>>>>>> ttm_resource_manager_evict_all() which you only need
>>>>>>>>>> to call and all tt objects are unpopulated.
>>>>>>>>> So instead of this BO list i create and later iterate in
>>>>>>>>> amdgpu from the IOMMU patch you just want to do it
>>>>>>>>> within
>>>>>>>>> TTM with a single function ? Makes much more sense.
>>>>>>>> Yes, exactly.
>>>>>>>>
>>>>>>>> The list_empty() checks we have in TTM for the LRU are
>>>>>>>> actually not the best idea, we should now check the
>>>>>>>> pin_count instead. This way we could also have a list of the
>>>>>>>> pinned BOs in TTM.
>>>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for
>>>>>>> the unpinned BOs and this new function for the pinned ones  ?
>>>>>>> It's probably a good idea to combine both iterations into this
>>>>>>> new function to cover all the BOs allocated on the device.
>>>>>> Yes, that's what I had in my mind as well.
>>>>>>
>>>>>>>> BTW: Have you thought about what happens when we unpopulate
>>>>>>>> a BO while we still try to use a kernel mapping for it? That
>>>>>>>> could have unforeseen consequences.
>>>>>>> Are you asking what happens to kmap or vmap style mapped CPU
>>>>>>> accesses once we drop all the DMA backing pages for a particular
>>>>>>> BO ? Because for user mappings
>>>>>>> (mmap) we took care of this with dummy page reroute but indeed
>>>>>>> nothing was done for in kernel CPU mappings.
>>>>>> Yes exactly that.
>>>>>>
>>>>>> In other words what happens if we free the ring buffer while the
>>>>>> kernel still writes to it?
>>>>>>
>>>>>> Christian.
>>>>> While we can't control user application accesses to the mapped buffers
>>>>> explicitly and hence we use page fault rerouting
>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>> CPU access a DMA buffer from the kernel ?
>>>> Yes, I fear we are going to need that.
>>> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
>>> could stuff this into begin/end_cpu_access


Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
driver specific hook ?


>>> (but only for the kernel, so a
>>> bit tricky)?


Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
some user process ? And  if we do need this distinction I think we should be able to
differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL for 
kernel thread.


>> Oh very very good point! I haven't thought about DMA-buf mmaps in this
>> context yet.
>>
>>
>>> btw the other issue with dma-buf (and even worse with dma_fence) is
>>> refcounting of the underlying drm_device. I'd expect that all your
>>> callbacks go boom if the dma_buf outlives your drm_device. That part isn't
>>> yet solved in your series here.
>> Well thinking more about this, it seems to be a another really good argument
>> why mapping pages from DMA-bufs into application address space directly is a
>> very bad idea :)
>>
>> But yes, we essentially can't remove the device as long as there is a
>> DMA-buf with mappings. No idea how to clean that one up.
> drm_dev_get/put in drm_prime helpers should get us like 90% there I think.


What are the other 10% ?


>
> The even more worrying thing is random dma_fence attached to the dma_resv
> object. We could try to clean all of ours up, but they could have escaped
> already into some other driver. And since we're talking about egpu
> hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
>
> I have no how to fix that one :-/
> -Daniel


I assume you are referring to sync_file_create/sync_file_get_fence API  for 
dma_fence export/import ?
So with DMA bufs we have the drm_gem_object as exporter specific private data
and so we can do drm_dev_get and put at the drm_gem_object layer to bind device 
life cycle
to that of each GEM object but, we don't have such mid-layer for dma_fence which 
could allow
us to increment device reference for each fence out there related to that device 
- is my understanding correct ?

Andrey


Andrey


>> Christian.
>>
>>> -Daniel
>>>
>>>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ?
>>>>> Is there other places ?
>>>> Puh, good question. I have no idea.
>>>>
>>>>> Another point is that at this point the driver shouldn't access any such
>>>>> buffers as we are at the process finishing the device.
>>>>> AFAIK there is no page fault mechanism for kernel mappings so I don't
>>>>> think there is anything else to do ?
>>>> Well there is a page fault handler for kernel mappings, but that one just
>>>> prints the stack trace into the system log and calls BUG(); :)
>>>>
>>>> Long story short we need to avoid any access to released pages after unplug.
>>>> No matter if it's from the kernel or userspace.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Andrey
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-25 19:34                             ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-25 19:34 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Alexander.Deucher, yuq825, Harry.Wentland, l.stach


On 11/25/20 11:36 AM, Daniel Vetter wrote:
> On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
>> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
>>> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
>>>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>>>> On 11/24/20 2:41 AM, Christian König wrote:
>>>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>>>> It would be cleaner if we could do the whole
>>>>>>>>>>>> handling in TTM. I also need to double check
>>>>>>>>>>>> what you are doing with this function.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology
>>>>>>>>>>> notifier per device." to see
>>>>>>>>>>> how i use it. I don't see why this should go
>>>>>>>>>>> into TTM mid-layer - the stuff I do inside
>>>>>>>>>>> is vendor specific and also I don't think TTM is
>>>>>>>>>>> explicitly aware of IOMMU ?
>>>>>>>>>>> Do you mean you prefer the IOMMU notifier to be
>>>>>>>>>>> registered from within TTM
>>>>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>>>> No, that is really vendor specific.
>>>>>>>>>>
>>>>>>>>>> What I meant is to have a function like
>>>>>>>>>> ttm_resource_manager_evict_all() which you only need
>>>>>>>>>> to call and all tt objects are unpopulated.
>>>>>>>>> So instead of this BO list i create and later iterate in
>>>>>>>>> amdgpu from the IOMMU patch you just want to do it
>>>>>>>>> within
>>>>>>>>> TTM with a single function ? Makes much more sense.
>>>>>>>> Yes, exactly.
>>>>>>>>
>>>>>>>> The list_empty() checks we have in TTM for the LRU are
>>>>>>>> actually not the best idea, we should now check the
>>>>>>>> pin_count instead. This way we could also have a list of the
>>>>>>>> pinned BOs in TTM.
>>>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for
>>>>>>> the unpinned BOs and this new function for the pinned ones  ?
>>>>>>> It's probably a good idea to combine both iterations into this
>>>>>>> new function to cover all the BOs allocated on the device.
>>>>>> Yes, that's what I had in my mind as well.
>>>>>>
>>>>>>>> BTW: Have you thought about what happens when we unpopulate
>>>>>>>> a BO while we still try to use a kernel mapping for it? That
>>>>>>>> could have unforeseen consequences.
>>>>>>> Are you asking what happens to kmap or vmap style mapped CPU
>>>>>>> accesses once we drop all the DMA backing pages for a particular
>>>>>>> BO ? Because for user mappings
>>>>>>> (mmap) we took care of this with dummy page reroute but indeed
>>>>>>> nothing was done for in kernel CPU mappings.
>>>>>> Yes exactly that.
>>>>>>
>>>>>> In other words what happens if we free the ring buffer while the
>>>>>> kernel still writes to it?
>>>>>>
>>>>>> Christian.
>>>>> While we can't control user application accesses to the mapped buffers
>>>>> explicitly and hence we use page fault rerouting
>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>> CPU access a DMA buffer from the kernel ?
>>>> Yes, I fear we are going to need that.
>>> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
>>> could stuff this into begin/end_cpu_access


Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
driver specific hook ?


>>> (but only for the kernel, so a
>>> bit tricky)?


Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
some user process ? And  if we do need this distinction I think we should be able to
differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL for 
kernel thread.


>> Oh very very good point! I haven't thought about DMA-buf mmaps in this
>> context yet.
>>
>>
>>> btw the other issue with dma-buf (and even worse with dma_fence) is
>>> refcounting of the underlying drm_device. I'd expect that all your
>>> callbacks go boom if the dma_buf outlives your drm_device. That part isn't
>>> yet solved in your series here.
>> Well thinking more about this, it seems to be a another really good argument
>> why mapping pages from DMA-bufs into application address space directly is a
>> very bad idea :)
>>
>> But yes, we essentially can't remove the device as long as there is a
>> DMA-buf with mappings. No idea how to clean that one up.
> drm_dev_get/put in drm_prime helpers should get us like 90% there I think.


What are the other 10% ?


>
> The even more worrying thing is random dma_fence attached to the dma_resv
> object. We could try to clean all of ours up, but they could have escaped
> already into some other driver. And since we're talking about egpu
> hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
>
> I have no how to fix that one :-/
> -Daniel


I assume you are referring to sync_file_create/sync_file_get_fence API  for 
dma_fence export/import ?
So with DMA bufs we have the drm_gem_object as exporter specific private data
and so we can do drm_dev_get and put at the drm_gem_object layer to bind device 
life cycle
to that of each GEM object but, we don't have such mid-layer for dma_fence which 
could allow
us to increment device reference for each fence out there related to that device 
- is my understanding correct ?

Andrey


Andrey


>> Christian.
>>
>>> -Daniel
>>>
>>>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ?
>>>>> Is there other places ?
>>>> Puh, good question. I have no idea.
>>>>
>>>>> Another point is that at this point the driver shouldn't access any such
>>>>> buffers as we are at the process finishing the device.
>>>>> AFAIK there is no page fault mechanism for kernel mappings so I don't
>>>>> think there is anything else to do ?
>>>> Well there is a page fault handler for kernel mappings, but that one just
>>>> prints the stack trace into the system log and calls BUG(); :)
>>>>
>>>> Long story short we need to avoid any access to released pages after unplug.
>>>> No matter if it's from the kernel or userspace.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Andrey
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-25 19:34                             ` Andrey Grodzovsky
@ 2020-11-27 13:10                               ` Grodzovsky, Andrey
  -1 siblings, 0 replies; 212+ messages in thread
From: Grodzovsky, Andrey @ 2020-11-27 13:10 UTC (permalink / raw)
  To: Daniel Vetter, Koenig, Christian
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Deucher, Alexander, yuq825


[-- Attachment #1.1: Type: text/plain, Size: 7581 bytes --]

Hey Daniel, just a ping on a bunch of questions i posted bellow.

Andtey
________________________________
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: 25 November 2020 14:34
To: Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: robh@kernel.org <robh@kernel.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; eric@anholt.net <eric@anholt.net>; ppaalanen@gmail.com <ppaalanen@gmail.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; Deucher, Alexander <Alexander.Deucher@amd.com>; l.stach@pengutronix.de <l.stach@pengutronix.de>; Wentland, Harry <Harry.Wentland@amd.com>; yuq825@gmail.com <yuq825@gmail.com>
Subject: Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use


On 11/25/20 11:36 AM, Daniel Vetter wrote:
> On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
>> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
>>> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
>>>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>>>> On 11/24/20 2:41 AM, Christian König wrote:
>>>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>>>> It would be cleaner if we could do the whole
>>>>>>>>>>>> handling in TTM. I also need to double check
>>>>>>>>>>>> what you are doing with this function.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology
>>>>>>>>>>> notifier per device." to see
>>>>>>>>>>> how i use it. I don't see why this should go
>>>>>>>>>>> into TTM mid-layer - the stuff I do inside
>>>>>>>>>>> is vendor specific and also I don't think TTM is
>>>>>>>>>>> explicitly aware of IOMMU ?
>>>>>>>>>>> Do you mean you prefer the IOMMU notifier to be
>>>>>>>>>>> registered from within TTM
>>>>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>>>> No, that is really vendor specific.
>>>>>>>>>>
>>>>>>>>>> What I meant is to have a function like
>>>>>>>>>> ttm_resource_manager_evict_all() which you only need
>>>>>>>>>> to call and all tt objects are unpopulated.
>>>>>>>>> So instead of this BO list i create and later iterate in
>>>>>>>>> amdgpu from the IOMMU patch you just want to do it
>>>>>>>>> within
>>>>>>>>> TTM with a single function ? Makes much more sense.
>>>>>>>> Yes, exactly.
>>>>>>>>
>>>>>>>> The list_empty() checks we have in TTM for the LRU are
>>>>>>>> actually not the best idea, we should now check the
>>>>>>>> pin_count instead. This way we could also have a list of the
>>>>>>>> pinned BOs in TTM.
>>>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for
>>>>>>> the unpinned BOs and this new function for the pinned ones  ?
>>>>>>> It's probably a good idea to combine both iterations into this
>>>>>>> new function to cover all the BOs allocated on the device.
>>>>>> Yes, that's what I had in my mind as well.
>>>>>>
>>>>>>>> BTW: Have you thought about what happens when we unpopulate
>>>>>>>> a BO while we still try to use a kernel mapping for it? That
>>>>>>>> could have unforeseen consequences.
>>>>>>> Are you asking what happens to kmap or vmap style mapped CPU
>>>>>>> accesses once we drop all the DMA backing pages for a particular
>>>>>>> BO ? Because for user mappings
>>>>>>> (mmap) we took care of this with dummy page reroute but indeed
>>>>>>> nothing was done for in kernel CPU mappings.
>>>>>> Yes exactly that.
>>>>>>
>>>>>> In other words what happens if we free the ring buffer while the
>>>>>> kernel still writes to it?
>>>>>>
>>>>>> Christian.
>>>>> While we can't control user application accesses to the mapped buffers
>>>>> explicitly and hence we use page fault rerouting
>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>> CPU access a DMA buffer from the kernel ?
>>>> Yes, I fear we are going to need that.
>>> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
>>> could stuff this into begin/end_cpu_access


Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
driver specific hook ?


>>> (but only for the kernel, so a
>>> bit tricky)?


Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
some user process ? And  if we do need this distinction I think we should be able to
differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL for
kernel thread.


>> Oh very very good point! I haven't thought about DMA-buf mmaps in this
>> context yet.
>>
>>
>>> btw the other issue with dma-buf (and even worse with dma_fence) is
>>> refcounting of the underlying drm_device. I'd expect that all your
>>> callbacks go boom if the dma_buf outlives your drm_device. That part isn't
>>> yet solved in your series here.
>> Well thinking more about this, it seems to be a another really good argument
>> why mapping pages from DMA-bufs into application address space directly is a
>> very bad idea :)
>>
>> But yes, we essentially can't remove the device as long as there is a
>> DMA-buf with mappings. No idea how to clean that one up.
> drm_dev_get/put in drm_prime helpers should get us like 90% there I think.


What are the other 10% ?


>
> The even more worrying thing is random dma_fence attached to the dma_resv
> object. We could try to clean all of ours up, but they could have escaped
> already into some other driver. And since we're talking about egpu
> hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
>
> I have no how to fix that one :-/
> -Daniel


I assume you are referring to sync_file_create/sync_file_get_fence API  for
dma_fence export/import ?
So with DMA bufs we have the drm_gem_object as exporter specific private data
and so we can do drm_dev_get and put at the drm_gem_object layer to bind device
life cycle
to that of each GEM object but, we don't have such mid-layer for dma_fence which
could allow
us to increment device reference for each fence out there related to that device
- is my understanding correct ?

Andrey


Andrey


>> Christian.
>>
>>> -Daniel
>>>
>>>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ?
>>>>> Is there other places ?
>>>> Puh, good question. I have no idea.
>>>>
>>>>> Another point is that at this point the driver shouldn't access any such
>>>>> buffers as we are at the process finishing the device.
>>>>> AFAIK there is no page fault mechanism for kernel mappings so I don't
>>>>> think there is anything else to do ?
>>>> Well there is a page fault handler for kernel mappings, but that one just
>>>> prints the stack trace into the system log and calls BUG(); :)
>>>>
>>>> Long story short we need to avoid any access to released pages after unplug.
>>>> No matter if it's from the kernel or userspace.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Andrey

[-- Attachment #1.2: Type: text/html, Size: 10946 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-27 13:10                               ` Grodzovsky, Andrey
  0 siblings, 0 replies; 212+ messages in thread
From: Grodzovsky, Andrey @ 2020-11-27 13:10 UTC (permalink / raw)
  To: Daniel Vetter, Koenig, Christian
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Deucher, Alexander, yuq825, Wentland, Harry, l.stach


[-- Attachment #1.1: Type: text/plain, Size: 7581 bytes --]

Hey Daniel, just a ping on a bunch of questions i posted bellow.

Andtey
________________________________
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: 25 November 2020 14:34
To: Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: robh@kernel.org <robh@kernel.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; eric@anholt.net <eric@anholt.net>; ppaalanen@gmail.com <ppaalanen@gmail.com>; amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; Deucher, Alexander <Alexander.Deucher@amd.com>; l.stach@pengutronix.de <l.stach@pengutronix.de>; Wentland, Harry <Harry.Wentland@amd.com>; yuq825@gmail.com <yuq825@gmail.com>
Subject: Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use


On 11/25/20 11:36 AM, Daniel Vetter wrote:
> On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
>> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
>>> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
>>>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>>>> On 11/24/20 2:41 AM, Christian König wrote:
>>>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>>>> It would be cleaner if we could do the whole
>>>>>>>>>>>> handling in TTM. I also need to double check
>>>>>>>>>>>> what you are doing with this function.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology
>>>>>>>>>>> notifier per device." to see
>>>>>>>>>>> how i use it. I don't see why this should go
>>>>>>>>>>> into TTM mid-layer - the stuff I do inside
>>>>>>>>>>> is vendor specific and also I don't think TTM is
>>>>>>>>>>> explicitly aware of IOMMU ?
>>>>>>>>>>> Do you mean you prefer the IOMMU notifier to be
>>>>>>>>>>> registered from within TTM
>>>>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>>>> No, that is really vendor specific.
>>>>>>>>>>
>>>>>>>>>> What I meant is to have a function like
>>>>>>>>>> ttm_resource_manager_evict_all() which you only need
>>>>>>>>>> to call and all tt objects are unpopulated.
>>>>>>>>> So instead of this BO list i create and later iterate in
>>>>>>>>> amdgpu from the IOMMU patch you just want to do it
>>>>>>>>> within
>>>>>>>>> TTM with a single function ? Makes much more sense.
>>>>>>>> Yes, exactly.
>>>>>>>>
>>>>>>>> The list_empty() checks we have in TTM for the LRU are
>>>>>>>> actually not the best idea, we should now check the
>>>>>>>> pin_count instead. This way we could also have a list of the
>>>>>>>> pinned BOs in TTM.
>>>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for
>>>>>>> the unpinned BOs and this new function for the pinned ones  ?
>>>>>>> It's probably a good idea to combine both iterations into this
>>>>>>> new function to cover all the BOs allocated on the device.
>>>>>> Yes, that's what I had in my mind as well.
>>>>>>
>>>>>>>> BTW: Have you thought about what happens when we unpopulate
>>>>>>>> a BO while we still try to use a kernel mapping for it? That
>>>>>>>> could have unforeseen consequences.
>>>>>>> Are you asking what happens to kmap or vmap style mapped CPU
>>>>>>> accesses once we drop all the DMA backing pages for a particular
>>>>>>> BO ? Because for user mappings
>>>>>>> (mmap) we took care of this with dummy page reroute but indeed
>>>>>>> nothing was done for in kernel CPU mappings.
>>>>>> Yes exactly that.
>>>>>>
>>>>>> In other words what happens if we free the ring buffer while the
>>>>>> kernel still writes to it?
>>>>>>
>>>>>> Christian.
>>>>> While we can't control user application accesses to the mapped buffers
>>>>> explicitly and hence we use page fault rerouting
>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>> CPU access a DMA buffer from the kernel ?
>>>> Yes, I fear we are going to need that.
>>> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
>>> could stuff this into begin/end_cpu_access


Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
driver specific hook ?


>>> (but only for the kernel, so a
>>> bit tricky)?


Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
some user process ? And  if we do need this distinction I think we should be able to
differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL for
kernel thread.


>> Oh very very good point! I haven't thought about DMA-buf mmaps in this
>> context yet.
>>
>>
>>> btw the other issue with dma-buf (and even worse with dma_fence) is
>>> refcounting of the underlying drm_device. I'd expect that all your
>>> callbacks go boom if the dma_buf outlives your drm_device. That part isn't
>>> yet solved in your series here.
>> Well thinking more about this, it seems to be a another really good argument
>> why mapping pages from DMA-bufs into application address space directly is a
>> very bad idea :)
>>
>> But yes, we essentially can't remove the device as long as there is a
>> DMA-buf with mappings. No idea how to clean that one up.
> drm_dev_get/put in drm_prime helpers should get us like 90% there I think.


What are the other 10% ?


>
> The even more worrying thing is random dma_fence attached to the dma_resv
> object. We could try to clean all of ours up, but they could have escaped
> already into some other driver. And since we're talking about egpu
> hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
>
> I have no how to fix that one :-/
> -Daniel


I assume you are referring to sync_file_create/sync_file_get_fence API  for
dma_fence export/import ?
So with DMA bufs we have the drm_gem_object as exporter specific private data
and so we can do drm_dev_get and put at the drm_gem_object layer to bind device
life cycle
to that of each GEM object but, we don't have such mid-layer for dma_fence which
could allow
us to increment device reference for each fence out there related to that device
- is my understanding correct ?

Andrey


Andrey


>> Christian.
>>
>>> -Daniel
>>>
>>>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ?
>>>>> Is there other places ?
>>>> Puh, good question. I have no idea.
>>>>
>>>>> Another point is that at this point the driver shouldn't access any such
>>>>> buffers as we are at the process finishing the device.
>>>>> AFAIK there is no page fault mechanism for kernel mappings so I don't
>>>>> think there is anything else to do ?
>>>> Well there is a page fault handler for kernel mappings, but that one just
>>>> prints the stack trace into the system log and calls BUG(); :)
>>>>
>>>> Long story short we need to avoid any access to released pages after unplug.
>>>> No matter if it's from the kernel or userspace.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Andrey

[-- Attachment #1.2: Type: text/html, Size: 10946 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
  2020-11-25 17:39           ` Andrey Grodzovsky
@ 2020-11-27 13:12             ` Grodzovsky, Andrey
  -1 siblings, 0 replies; 212+ messages in thread
From: Grodzovsky, Andrey @ 2020-11-27 13:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: amd-gfx list, Christian König, dri-devel, Qiang Yu, Greg KH,
	Deucher, Alexander


[-- Attachment #1.1: Type: text/plain, Size: 6230 bytes --]

Hey, just a ping on my comments/question bellow.

Andrey
________________________________
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: 25 November 2020 12:39
To: Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>; dri-devel <dri-devel@lists.freedesktop.org>; Christian König <ckoenig.leichtzumerken@gmail.com>; Rob Herring <robh@kernel.org>; Lucas Stach <l.stach@pengutronix.de>; Qiang Yu <yuq825@gmail.com>; Anholt, Eric <eric@anholt.net>; Pekka Paalanen <ppaalanen@gmail.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Greg KH <gregkh@linuxfoundation.org>; Wentland, Harry <Harry.Wentland@amd.com>
Subject: Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug



On 11/25/20 4:04 AM, Daniel Vetter wrote:

On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com><mailto:Andrey.Grodzovsky@amd.com> wrote:




On 11/24/20 9:49 AM, Daniel Vetter wrote:


On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:


Avoids NULL ptr due to kobj->sd being unset on device removal.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><mailto:andrey.grodzovsky@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
  2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index caf828a..812e592 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -27,6 +27,7 @@
  #include <linux/uaccess.h>
  #include <linux/reboot.h>
  #include <linux/syscalls.h>
+#include <drm/drm_drv.h>

  #include "amdgpu.h"
  #include "amdgpu_ras.h"
@@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
             .attrs = attrs,
     };

-    sysfs_remove_group(&adev->dev->kobj, &group);
+    if (!drm_dev_is_unplugged(&adev->ddev))
+            sysfs_remove_group(&adev->dev->kobj, &group);


This looks wrong. sysfs, like any other interface, should be
unconditionally thrown out when we do the drm_dev_unregister. Whether
hotunplugged or not should matter at all. Either this isn't needed at all,
or something is wrong with the ordering here. But definitely fishy.
-Daniel




So technically this is needed because kobejct's sysfs directory entry kobj->sd
is set to NULL
on device removal (from sysfs_remove_dir) but because we don't finalize the device
until last reference to drm file is dropped (which can happen later) we end up
calling sysfs_remove_file/dir after
this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
sysfs_remove_dir
is not and that why I guard against calls to sysfs_remove_dir.
But indeed the whole approach in the driver is incorrect, as Greg pointed out -
we should use
default groups attributes instead of explicit calls to sysfs interface and this
would save those troubles.
But again. the issue here of scope of work, converting all of amdgpu to default
groups attributes is somewhat
lengthy process with extra testing as the entire driver is papered with sysfs
references and seems to me more of a standalone
cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
that it makes more sense
to finalize and push the hot unplug patches so that this new functionality can
be part of the driver sooner
and then incrementally improve it by working on those other topics. Just as
devm_/drmm_ I also added sysfs cleanup
to my TODO list in the RFC patch.



Hm, whether you solve this with the default group stuff to
auto-remove, or remove explicitly at the right time doesn't matter
much. The underlying problem you have here is that it's done way too
late.

As far as I understood correctly the default group attrs by reading this
article by Greg - https://www.linux.com/news/how-create-sysfs-file-correctly/
it will be removed together with the device and not too late like now and I quote
from the last paragraph there:

"By setting this value, you don’t have to do anything in your
probe() or release() functions at all in order for the
sysfs files to be properly created and destroyed whenever your
device is added or removed from the system. And you will, most
importantly, do it in a race-free manner, which is always a good thing."

To me this seems like the best solution to the late remove issue. What do
you think ?


 sysfs removal (like all uapi interfaces) need to be removed as
part of drm_dev_unregister.


Do you mean we need to trace and aggregate all sysfs files creation within
the low level drivers and then call some sysfs release function inside drm_dev_unregister
to iterate and release them all ?


 I guess aside from the split into fini_hw
and fini_sw, you also need an unregister_late callback (like we have
already for drm_connector, so that e.g. backlight and similar stuff
can be unregistered).


Is this the callback you suggest to call from within drm_dev_unregister and
it will be responsible to release all sysfs files created within the driver ?

Andrey




Papering over the underlying bug like this doesn't really fix much,
the lifetimes are still wrong.
-Daniel




Andrey








     return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index 2b7c90b..54331fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -24,6 +24,7 @@
  #include <linux/firmware.h>
  #include <linux/slab.h>
  #include <linux/module.h>
+#include <drm/drm_drv.h>

  #include "amdgpu.h"
  #include "amdgpu_ucode.h"
@@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)

  void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
  {
-    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
+    if (!drm_dev_is_unplugged(&adev->ddev))
+            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
  }

  static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
--
2.7.4








[-- Attachment #1.2: Type: text/html, Size: 14803 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
@ 2020-11-27 13:12             ` Grodzovsky, Andrey
  0 siblings, 0 replies; 212+ messages in thread
From: Grodzovsky, Andrey @ 2020-11-27 13:12 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Herring, amd-gfx list, Christian König, dri-devel,
	Anholt, Eric, Pekka Paalanen, Qiang Yu, Greg KH, Deucher,
	Alexander, Wentland,  Harry, Lucas Stach


[-- Attachment #1.1: Type: text/plain, Size: 6230 bytes --]

Hey, just a ping on my comments/question bellow.

Andrey
________________________________
From: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>
Sent: 25 November 2020 12:39
To: Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx list <amd-gfx@lists.freedesktop.org>; dri-devel <dri-devel@lists.freedesktop.org>; Christian König <ckoenig.leichtzumerken@gmail.com>; Rob Herring <robh@kernel.org>; Lucas Stach <l.stach@pengutronix.de>; Qiang Yu <yuq825@gmail.com>; Anholt, Eric <eric@anholt.net>; Pekka Paalanen <ppaalanen@gmail.com>; Deucher, Alexander <Alexander.Deucher@amd.com>; Greg KH <gregkh@linuxfoundation.org>; Wentland, Harry <Harry.Wentland@amd.com>
Subject: Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug



On 11/25/20 4:04 AM, Daniel Vetter wrote:

On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com><mailto:Andrey.Grodzovsky@amd.com> wrote:




On 11/24/20 9:49 AM, Daniel Vetter wrote:


On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:


Avoids NULL ptr due to kobj->sd being unset on device removal.

Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com><mailto:andrey.grodzovsky@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
  2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index caf828a..812e592 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -27,6 +27,7 @@
  #include <linux/uaccess.h>
  #include <linux/reboot.h>
  #include <linux/syscalls.h>
+#include <drm/drm_drv.h>

  #include "amdgpu.h"
  #include "amdgpu_ras.h"
@@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
             .attrs = attrs,
     };

-    sysfs_remove_group(&adev->dev->kobj, &group);
+    if (!drm_dev_is_unplugged(&adev->ddev))
+            sysfs_remove_group(&adev->dev->kobj, &group);


This looks wrong. sysfs, like any other interface, should be
unconditionally thrown out when we do the drm_dev_unregister. Whether
hotunplugged or not should matter at all. Either this isn't needed at all,
or something is wrong with the ordering here. But definitely fishy.
-Daniel




So technically this is needed because kobejct's sysfs directory entry kobj->sd
is set to NULL
on device removal (from sysfs_remove_dir) but because we don't finalize the device
until last reference to drm file is dropped (which can happen later) we end up
calling sysfs_remove_file/dir after
this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
sysfs_remove_dir
is not and that why I guard against calls to sysfs_remove_dir.
But indeed the whole approach in the driver is incorrect, as Greg pointed out -
we should use
default groups attributes instead of explicit calls to sysfs interface and this
would save those troubles.
But again. the issue here of scope of work, converting all of amdgpu to default
groups attributes is somewhat
lengthy process with extra testing as the entire driver is papered with sysfs
references and seems to me more of a standalone
cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
that it makes more sense
to finalize and push the hot unplug patches so that this new functionality can
be part of the driver sooner
and then incrementally improve it by working on those other topics. Just as
devm_/drmm_ I also added sysfs cleanup
to my TODO list in the RFC patch.



Hm, whether you solve this with the default group stuff to
auto-remove, or remove explicitly at the right time doesn't matter
much. The underlying problem you have here is that it's done way too
late.

As far as I understood correctly the default group attrs by reading this
article by Greg - https://www.linux.com/news/how-create-sysfs-file-correctly/
it will be removed together with the device and not too late like now and I quote
from the last paragraph there:

"By setting this value, you don’t have to do anything in your
probe() or release() functions at all in order for the
sysfs files to be properly created and destroyed whenever your
device is added or removed from the system. And you will, most
importantly, do it in a race-free manner, which is always a good thing."

To me this seems like the best solution to the late remove issue. What do
you think ?


 sysfs removal (like all uapi interfaces) need to be removed as
part of drm_dev_unregister.


Do you mean we need to trace and aggregate all sysfs files creation within
the low level drivers and then call some sysfs release function inside drm_dev_unregister
to iterate and release them all ?


 I guess aside from the split into fini_hw
and fini_sw, you also need an unregister_late callback (like we have
already for drm_connector, so that e.g. backlight and similar stuff
can be unregistered).


Is this the callback you suggest to call from within drm_dev_unregister and
it will be responsible to release all sysfs files created within the driver ?

Andrey




Papering over the underlying bug like this doesn't really fix much,
the lifetimes are still wrong.
-Daniel




Andrey








     return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index 2b7c90b..54331fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -24,6 +24,7 @@
  #include <linux/firmware.h>
  #include <linux/slab.h>
  #include <linux/module.h>
+#include <drm/drm_drv.h>

  #include "amdgpu.h"
  #include "amdgpu_ucode.h"
@@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)

  void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
  {
-    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
+    if (!drm_dev_is_unplugged(&adev->ddev))
+            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
  }

  static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
--
2.7.4








[-- Attachment #1.2: Type: text/html, Size: 14803 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-25 19:34                             ` Andrey Grodzovsky
@ 2020-11-27 14:59                               ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-27 14:59 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher,
	yuq825, christian.koenig

On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote:
> 
> On 11/25/20 11:36 AM, Daniel Vetter wrote:
> > On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
> > > Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> > > > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> > > > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > > > > > On 11/24/20 2:41 AM, Christian König wrote:
> > > > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > > > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > > > > > what you are doing with this function.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Christian.
> > > > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > > > > > notifier per device." to see
> > > > > > > > > > > > how i use it. I don't see why this should go
> > > > > > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > > > > > registered from within TTM
> > > > > > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > > > > > No, that is really vendor specific.
> > > > > > > > > > > 
> > > > > > > > > > > What I meant is to have a function like
> > > > > > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > > > > > to call and all tt objects are unpopulated.
> > > > > > > > > > So instead of this BO list i create and later iterate in
> > > > > > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > > > > > within
> > > > > > > > > > TTM with a single function ? Makes much more sense.
> > > > > > > > > Yes, exactly.
> > > > > > > > > 
> > > > > > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > > > > > actually not the best idea, we should now check the
> > > > > > > > > pin_count instead. This way we could also have a list of the
> > > > > > > > > pinned BOs in TTM.
> > > > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > > > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > > > > > It's probably a good idea to combine both iterations into this
> > > > > > > > new function to cover all the BOs allocated on the device.
> > > > > > > Yes, that's what I had in my mind as well.
> > > > > > > 
> > > > > > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > > > > > could have unforeseen consequences.
> > > > > > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > > > > > accesses once we drop all the DMA backing pages for a particular
> > > > > > > > BO ? Because for user mappings
> > > > > > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > > > > > nothing was done for in kernel CPU mappings.
> > > > > > > Yes exactly that.
> > > > > > > 
> > > > > > > In other words what happens if we free the ring buffer while the
> > > > > > > kernel still writes to it?
> > > > > > > 
> > > > > > > Christian.
> > > > > > While we can't control user application accesses to the mapped buffers
> > > > > > explicitly and hence we use page fault rerouting
> > > > > > I am thinking that in this  case we may be able to sprinkle
> > > > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > > > CPU access a DMA buffer from the kernel ?
> > > > > Yes, I fear we are going to need that.
> > > > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> > > > could stuff this into begin/end_cpu_access
> 
> 
> Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
> driver specific hook ?
> 
> 
> > > > (but only for the kernel, so a
> > > > bit tricky)?
> 
> 
> Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
> some user process ? And  if we do need this distinction I think we should be able to
> differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL
> for kernel thread.

Userspace mmap is handled by punching out the pte. So we don't need to do
anything special there.

For kernel mmap the begin/end should be all in the same context (so we
could use the srcu lock that works underneath drm_dev_enter/exit), since
at least right now kernel vmaps of dma-buf are very long-lived.

But the good news is that Thomas Zimmerman is working on this problem
already for different reasons, so it might be that we won't have any
long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in
there.

> > > Oh very very good point! I haven't thought about DMA-buf mmaps in this
> > > context yet.
> > > 
> > > 
> > > > btw the other issue with dma-buf (and even worse with dma_fence) is
> > > > refcounting of the underlying drm_device. I'd expect that all your
> > > > callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> > > > yet solved in your series here.
> > > Well thinking more about this, it seems to be a another really good argument
> > > why mapping pages from DMA-bufs into application address space directly is a
> > > very bad idea :)
> > > 
> > > But yes, we essentially can't remove the device as long as there is a
> > > DMA-buf with mappings. No idea how to clean that one up.
> > drm_dev_get/put in drm_prime helpers should get us like 90% there I think.
> 
> 
> What are the other 10% ?

dma_fence, which is also about 90% of the work probably. But I'm
guesstimating only 10% of the oopses you can hit. Since generally the
dma_fence for a buffer don't outlive the underlying buffer. So usually no
problems happen when we've solved the dma-buf sharing, but the dma_fence
can outlive the dma-buf, so there's still possibilities of crashing.

> > The even more worrying thing is random dma_fence attached to the dma_resv
> > object. We could try to clean all of ours up, but they could have escaped
> > already into some other driver. And since we're talking about egpu
> > hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
> > 
> > I have no how to fix that one :-/
> > -Daniel
> 
> 
> I assume you are referring to sync_file_create/sync_file_get_fence API  for
> dma_fence export/import ?

So dma_fence is a general issue, there's a pile of interfaces that result
in sharing with other drivers:
- dma_resv in the dma_buf
- sync_file
- drm_syncobj (but I think that's not yet cross driver, but probably
  changes)

In each of these cases drivers can pick up the dma_fence and use it
internally for all kinds of purposes (could end up in the scheduler or
wherever).

> So with DMA bufs we have the drm_gem_object as exporter specific private data
> and so we can do drm_dev_get and put at the drm_gem_object layer to bind
> device life cycle
> to that of each GEM object but, we don't have such mid-layer for dma_fence
> which could allow
> us to increment device reference for each fence out there related to that
> device - is my understanding correct ?

Yeah that's the annoying part with dma-fence. No existing generic place to
put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the
other problems first.
-Daniel

> 
> Andrey
> 
> 
> Andrey
> 
> 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > > > > > Is there other places ?
> > > > > Puh, good question. I have no idea.
> > > > > 
> > > > > > Another point is that at this point the driver shouldn't access any such
> > > > > > buffers as we are at the process finishing the device.
> > > > > > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > > > > > think there is anything else to do ?
> > > > > Well there is a page fault handler for kernel mappings, but that one just
> > > > > prints the stack trace into the system log and calls BUG(); :)
> > > > > 
> > > > > Long story short we need to avoid any access to released pages after unplug.
> > > > > No matter if it's from the kernel or userspace.
> > > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Andrey

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-27 14:59                               ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-27 14:59 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx,
	Daniel Vetter, gregkh, Alexander.Deucher, yuq825, Harry.Wentland,
	christian.koenig, l.stach

On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote:
> 
> On 11/25/20 11:36 AM, Daniel Vetter wrote:
> > On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
> > > Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> > > > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> > > > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > > > > > On 11/24/20 2:41 AM, Christian König wrote:
> > > > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > > > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > > > > > what you are doing with this function.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Christian.
> > > > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > > > > > notifier per device." to see
> > > > > > > > > > > > how i use it. I don't see why this should go
> > > > > > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > > > > > registered from within TTM
> > > > > > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > > > > > No, that is really vendor specific.
> > > > > > > > > > > 
> > > > > > > > > > > What I meant is to have a function like
> > > > > > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > > > > > to call and all tt objects are unpopulated.
> > > > > > > > > > So instead of this BO list i create and later iterate in
> > > > > > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > > > > > within
> > > > > > > > > > TTM with a single function ? Makes much more sense.
> > > > > > > > > Yes, exactly.
> > > > > > > > > 
> > > > > > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > > > > > actually not the best idea, we should now check the
> > > > > > > > > pin_count instead. This way we could also have a list of the
> > > > > > > > > pinned BOs in TTM.
> > > > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > > > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > > > > > It's probably a good idea to combine both iterations into this
> > > > > > > > new function to cover all the BOs allocated on the device.
> > > > > > > Yes, that's what I had in my mind as well.
> > > > > > > 
> > > > > > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > > > > > could have unforeseen consequences.
> > > > > > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > > > > > accesses once we drop all the DMA backing pages for a particular
> > > > > > > > BO ? Because for user mappings
> > > > > > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > > > > > nothing was done for in kernel CPU mappings.
> > > > > > > Yes exactly that.
> > > > > > > 
> > > > > > > In other words what happens if we free the ring buffer while the
> > > > > > > kernel still writes to it?
> > > > > > > 
> > > > > > > Christian.
> > > > > > While we can't control user application accesses to the mapped buffers
> > > > > > explicitly and hence we use page fault rerouting
> > > > > > I am thinking that in this  case we may be able to sprinkle
> > > > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > > > CPU access a DMA buffer from the kernel ?
> > > > > Yes, I fear we are going to need that.
> > > > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> > > > could stuff this into begin/end_cpu_access
> 
> 
> Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
> driver specific hook ?
> 
> 
> > > > (but only for the kernel, so a
> > > > bit tricky)?
> 
> 
> Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
> some user process ? And  if we do need this distinction I think we should be able to
> differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL
> for kernel thread.

Userspace mmap is handled by punching out the pte. So we don't need to do
anything special there.

For kernel mmap the begin/end should be all in the same context (so we
could use the srcu lock that works underneath drm_dev_enter/exit), since
at least right now kernel vmaps of dma-buf are very long-lived.

But the good news is that Thomas Zimmerman is working on this problem
already for different reasons, so it might be that we won't have any
long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in
there.

> > > Oh very very good point! I haven't thought about DMA-buf mmaps in this
> > > context yet.
> > > 
> > > 
> > > > btw the other issue with dma-buf (and even worse with dma_fence) is
> > > > refcounting of the underlying drm_device. I'd expect that all your
> > > > callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> > > > yet solved in your series here.
> > > Well thinking more about this, it seems to be a another really good argument
> > > why mapping pages from DMA-bufs into application address space directly is a
> > > very bad idea :)
> > > 
> > > But yes, we essentially can't remove the device as long as there is a
> > > DMA-buf with mappings. No idea how to clean that one up.
> > drm_dev_get/put in drm_prime helpers should get us like 90% there I think.
> 
> 
> What are the other 10% ?

dma_fence, which is also about 90% of the work probably. But I'm
guesstimating only 10% of the oopses you can hit. Since generally the
dma_fence for a buffer don't outlive the underlying buffer. So usually no
problems happen when we've solved the dma-buf sharing, but the dma_fence
can outlive the dma-buf, so there's still possibilities of crashing.

> > The even more worrying thing is random dma_fence attached to the dma_resv
> > object. We could try to clean all of ours up, but they could have escaped
> > already into some other driver. And since we're talking about egpu
> > hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
> > 
> > I have no how to fix that one :-/
> > -Daniel
> 
> 
> I assume you are referring to sync_file_create/sync_file_get_fence API  for
> dma_fence export/import ?

So dma_fence is a general issue, there's a pile of interfaces that result
in sharing with other drivers:
- dma_resv in the dma_buf
- sync_file
- drm_syncobj (but I think that's not yet cross driver, but probably
  changes)

In each of these cases drivers can pick up the dma_fence and use it
internally for all kinds of purposes (could end up in the scheduler or
wherever).

> So with DMA bufs we have the drm_gem_object as exporter specific private data
> and so we can do drm_dev_get and put at the drm_gem_object layer to bind
> device life cycle
> to that of each GEM object but, we don't have such mid-layer for dma_fence
> which could allow
> us to increment device reference for each fence out there related to that
> device - is my understanding correct ?

Yeah that's the annoying part with dma-fence. No existing generic place to
put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the
other problems first.
-Daniel

> 
> Andrey
> 
> 
> Andrey
> 
> 
> > > Christian.
> > > 
> > > > -Daniel
> > > > 
> > > > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > > > > > Is there other places ?
> > > > > Puh, good question. I have no idea.
> > > > > 
> > > > > > Another point is that at this point the driver shouldn't access any such
> > > > > > buffers as we are at the process finishing the device.
> > > > > > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > > > > > think there is anything else to do ?
> > > > > Well there is a page fault handler for kernel mappings, but that one just
> > > > > prints the stack trace into the system log and calls BUG(); :)
> > > > > 
> > > > > Long story short we need to avoid any access to released pages after unplug.
> > > > > No matter if it's from the kernel or userspace.
> > > > > 
> > > > > Regards,
> > > > > Christian.
> > > > > 
> > > > > > Andrey

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
  2020-11-25 17:39           ` Andrey Grodzovsky
@ 2020-11-27 15:04             ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-27 15:04 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Christian König, dri-devel, amd-gfx list, Greg KH,
	Alex Deucher, Qiang Yu

On Wed, Nov 25, 2020 at 12:39:47PM -0500, Andrey Grodzovsky wrote:
> 
> On 11/25/20 4:04 AM, Daniel Vetter wrote:
> > On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky@amd.com> wrote:
> > > 
> > > On 11/24/20 9:49 AM, Daniel Vetter wrote:
> > > > On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
> > > > > Avoids NULL ptr due to kobj->sd being unset on device removal.
> > > > > 
> > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > ---
> > > > >    drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
> > > > >    drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
> > > > >    2 files changed, 6 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > > > index caf828a..812e592 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > > > @@ -27,6 +27,7 @@
> > > > >    #include <linux/uaccess.h>
> > > > >    #include <linux/reboot.h>
> > > > >    #include <linux/syscalls.h>
> > > > > +#include <drm/drm_drv.h>
> > > > > 
> > > > >    #include "amdgpu.h"
> > > > >    #include "amdgpu_ras.h"
> > > > > @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
> > > > >               .attrs = attrs,
> > > > >       };
> > > > > 
> > > > > -    sysfs_remove_group(&adev->dev->kobj, &group);
> > > > > +    if (!drm_dev_is_unplugged(&adev->ddev))
> > > > > +            sysfs_remove_group(&adev->dev->kobj, &group);
> > > > This looks wrong. sysfs, like any other interface, should be
> > > > unconditionally thrown out when we do the drm_dev_unregister. Whether
> > > > hotunplugged or not should matter at all. Either this isn't needed at all,
> > > > or something is wrong with the ordering here. But definitely fishy.
> > > > -Daniel
> > > 
> > > So technically this is needed because kobejct's sysfs directory entry kobj->sd
> > > is set to NULL
> > > on device removal (from sysfs_remove_dir) but because we don't finalize the device
> > > until last reference to drm file is dropped (which can happen later) we end up
> > > calling sysfs_remove_file/dir after
> > > this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
> > > sysfs_remove_dir
> > > is not and that why I guard against calls to sysfs_remove_dir.
> > > But indeed the whole approach in the driver is incorrect, as Greg pointed out -
> > > we should use
> > > default groups attributes instead of explicit calls to sysfs interface and this
> > > would save those troubles.
> > > But again. the issue here of scope of work, converting all of amdgpu to default
> > > groups attributes is somewhat
> > > lengthy process with extra testing as the entire driver is papered with sysfs
> > > references and seems to me more of a standalone
> > > cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
> > > that it makes more sense
> > > to finalize and push the hot unplug patches so that this new functionality can
> > > be part of the driver sooner
> > > and then incrementally improve it by working on those other topics. Just as
> > > devm_/drmm_ I also added sysfs cleanup
> > > to my TODO list in the RFC patch.
> > Hm, whether you solve this with the default group stuff to
> > auto-remove, or remove explicitly at the right time doesn't matter
> > much. The underlying problem you have here is that it's done way too
> > late.
> 
> As far as I understood correctly the default group attrs by reading this
> article by Greg - https://www.linux.com/news/how-create-sysfs-file-correctly/
> it will be removed together with the device and not too late like now and I quote
> from the last paragraph there:
> 
> "By setting this value, you don’t have to do anything in your
> probe() or release() functions at all in order for the
> sysfs files to be properly created and destroyed whenever your
> device is added or removed from the system. And you will, most
> importantly, do it in a race-free manner, which is always a good thing."
> 
> To me this seems like the best solution to the late remove issue. What do
> you think ?
> 
> 
> >   sysfs removal (like all uapi interfaces) need to be removed as
> > part of drm_dev_unregister.
> 
> 
> Do you mean we need to trace and aggregate all sysfs files creation within
> the low level drivers and then call some sysfs release function inside
> drm_dev_unregister
> to iterate and release them all ?

That would just reinvent the proper solution Greg explained above. For now
I think you just need some driver callback that you call right after
drm_dev_unplug (or drm_dev_unregister) to clean up these sysfs interfaces.
Afaiui the important part is to clean up your additional interfaces from
the ->remove callback, since at that point the core sysfs stuff still
exists.

Maybe you want to do another loop over all IP blocks and a ->unregister
callback, or maybe it's just 1-2 cases you call directly.

> >   I guess aside from the split into fini_hw
> > and fini_sw, you also need an unregister_late callback (like we have
> > already for drm_connector, so that e.g. backlight and similar stuff
> > can be unregistered).
> 
> 
> Is this the callback you suggest to call from within drm_dev_unregister and
> it will be responsible to release all sysfs files created within the driver ?

Nah that would be an amdgpu ip block callback (forgot what it's called,
too comfy to fire up an editor right now and look it up, but you have a
bunch of these loops all over).

I think the core solution we want is what Greg already laid out. This idea
here was just an amdgpu interim plan, if the core solution is a bit too
invasive to implement right away.
-Daniel

> 
> Andrey
> 
> 
> > 
> > Papering over the underlying bug like this doesn't really fix much,
> > the lifetimes are still wrong.
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > > >       return 0;
> > > > >    }
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > > > > index 2b7c90b..54331fc 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > > > > @@ -24,6 +24,7 @@
> > > > >    #include <linux/firmware.h>
> > > > >    #include <linux/slab.h>
> > > > >    #include <linux/module.h>
> > > > > +#include <drm/drm_drv.h>
> > > > > 
> > > > >    #include "amdgpu.h"
> > > > >    #include "amdgpu_ucode.h"
> > > > > @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
> > > > > 
> > > > >    void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
> > > > >    {
> > > > > -    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> > > > > +    if (!drm_dev_is_unplugged(&adev->ddev))
> > > > > +            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> > > > >    }
> > > > > 
> > > > >    static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
> > > > > --
> > > > > 2.7.4
> > > > > 
> > 
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
@ 2020-11-27 15:04             ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-27 15:04 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Rob Herring, Christian König, dri-devel, Anholt, Eric,
	Pekka Paalanen, amd-gfx list, Daniel Vetter, Greg KH,
	Alex Deucher, Qiang Yu, Wentland, Harry, Lucas Stach

On Wed, Nov 25, 2020 at 12:39:47PM -0500, Andrey Grodzovsky wrote:
> 
> On 11/25/20 4:04 AM, Daniel Vetter wrote:
> > On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky@amd.com> wrote:
> > > 
> > > On 11/24/20 9:49 AM, Daniel Vetter wrote:
> > > > On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
> > > > > Avoids NULL ptr due to kobj->sd being unset on device removal.
> > > > > 
> > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > ---
> > > > >    drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
> > > > >    drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
> > > > >    2 files changed, 6 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > > > index caf828a..812e592 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > > > > @@ -27,6 +27,7 @@
> > > > >    #include <linux/uaccess.h>
> > > > >    #include <linux/reboot.h>
> > > > >    #include <linux/syscalls.h>
> > > > > +#include <drm/drm_drv.h>
> > > > > 
> > > > >    #include "amdgpu.h"
> > > > >    #include "amdgpu_ras.h"
> > > > > @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
> > > > >               .attrs = attrs,
> > > > >       };
> > > > > 
> > > > > -    sysfs_remove_group(&adev->dev->kobj, &group);
> > > > > +    if (!drm_dev_is_unplugged(&adev->ddev))
> > > > > +            sysfs_remove_group(&adev->dev->kobj, &group);
> > > > This looks wrong. sysfs, like any other interface, should be
> > > > unconditionally thrown out when we do the drm_dev_unregister. Whether
> > > > hotunplugged or not should matter at all. Either this isn't needed at all,
> > > > or something is wrong with the ordering here. But definitely fishy.
> > > > -Daniel
> > > 
> > > So technically this is needed because kobejct's sysfs directory entry kobj->sd
> > > is set to NULL
> > > on device removal (from sysfs_remove_dir) but because we don't finalize the device
> > > until last reference to drm file is dropped (which can happen later) we end up
> > > calling sysfs_remove_file/dir after
> > > this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
> > > sysfs_remove_dir
> > > is not and that why I guard against calls to sysfs_remove_dir.
> > > But indeed the whole approach in the driver is incorrect, as Greg pointed out -
> > > we should use
> > > default groups attributes instead of explicit calls to sysfs interface and this
> > > would save those troubles.
> > > But again. the issue here of scope of work, converting all of amdgpu to default
> > > groups attributes is somewhat
> > > lengthy process with extra testing as the entire driver is papered with sysfs
> > > references and seems to me more of a standalone
> > > cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
> > > that it makes more sense
> > > to finalize and push the hot unplug patches so that this new functionality can
> > > be part of the driver sooner
> > > and then incrementally improve it by working on those other topics. Just as
> > > devm_/drmm_ I also added sysfs cleanup
> > > to my TODO list in the RFC patch.
> > Hm, whether you solve this with the default group stuff to
> > auto-remove, or remove explicitly at the right time doesn't matter
> > much. The underlying problem you have here is that it's done way too
> > late.
> 
> As far as I understood correctly the default group attrs by reading this
> article by Greg - https://www.linux.com/news/how-create-sysfs-file-correctly/
> it will be removed together with the device and not too late like now and I quote
> from the last paragraph there:
> 
> "By setting this value, you don’t have to do anything in your
> probe() or release() functions at all in order for the
> sysfs files to be properly created and destroyed whenever your
> device is added or removed from the system. And you will, most
> importantly, do it in a race-free manner, which is always a good thing."
> 
> To me this seems like the best solution to the late remove issue. What do
> you think ?
> 
> 
> >   sysfs removal (like all uapi interfaces) need to be removed as
> > part of drm_dev_unregister.
> 
> 
> Do you mean we need to trace and aggregate all sysfs files creation within
> the low level drivers and then call some sysfs release function inside
> drm_dev_unregister
> to iterate and release them all ?

That would just reinvent the proper solution Greg explained above. For now
I think you just need some driver callback that you call right after
drm_dev_unplug (or drm_dev_unregister) to clean up these sysfs interfaces.
Afaiui the important part is to clean up your additional interfaces from
the ->remove callback, since at that point the core sysfs stuff still
exists.

Maybe you want to do another loop over all IP blocks and a ->unregister
callback, or maybe it's just 1-2 cases you call directly.

> >   I guess aside from the split into fini_hw
> > and fini_sw, you also need an unregister_late callback (like we have
> > already for drm_connector, so that e.g. backlight and similar stuff
> > can be unregistered).
> 
> 
> Is this the callback you suggest to call from within drm_dev_unregister and
> it will be responsible to release all sysfs files created within the driver ?

Nah that would be an amdgpu ip block callback (forgot what it's called,
too comfy to fire up an editor right now and look it up, but you have a
bunch of these loops all over).

I think the core solution we want is what Greg already laid out. This idea
here was just an amdgpu interim plan, if the core solution is a bit too
invasive to implement right away.
-Daniel

> 
> Andrey
> 
> 
> > 
> > Papering over the underlying bug like this doesn't really fix much,
> > the lifetimes are still wrong.
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > > >       return 0;
> > > > >    }
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > > > > index 2b7c90b..54331fc 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > > > > @@ -24,6 +24,7 @@
> > > > >    #include <linux/firmware.h>
> > > > >    #include <linux/slab.h>
> > > > >    #include <linux/module.h>
> > > > > +#include <drm/drm_drv.h>
> > > > > 
> > > > >    #include "amdgpu.h"
> > > > >    #include "amdgpu_ucode.h"
> > > > > @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
> > > > > 
> > > > >    void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
> > > > >    {
> > > > > -    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> > > > > +    if (!drm_dev_is_unplugged(&adev->ddev))
> > > > > +            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> > > > >    }
> > > > > 
> > > > >    static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
> > > > > --
> > > > > 2.7.4
> > > > > 
> > 
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
  2020-11-27 15:04             ` Daniel Vetter
@ 2020-11-27 15:34               ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-27 15:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: amd-gfx list, Christian König, dri-devel, Qiang Yu, Greg KH,
	Alex Deucher


On 11/27/20 10:04 AM, Daniel Vetter wrote:
> On Wed, Nov 25, 2020 at 12:39:47PM -0500, Andrey Grodzovsky wrote:
>> On 11/25/20 4:04 AM, Daniel Vetter wrote:
>>> On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
>>> <Andrey.Grodzovsky@amd.com> wrote:
>>>> On 11/24/20 9:49 AM, Daniel Vetter wrote:
>>>>> On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
>>>>>> Avoids NULL ptr due to kobj->sd being unset on device removal.
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
>>>>>>     2 files changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>>>> index caf828a..812e592 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>>>> @@ -27,6 +27,7 @@
>>>>>>     #include <linux/uaccess.h>
>>>>>>     #include <linux/reboot.h>
>>>>>>     #include <linux/syscalls.h>
>>>>>> +#include <drm/drm_drv.h>
>>>>>>
>>>>>>     #include "amdgpu.h"
>>>>>>     #include "amdgpu_ras.h"
>>>>>> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
>>>>>>                .attrs = attrs,
>>>>>>        };
>>>>>>
>>>>>> -    sysfs_remove_group(&adev->dev->kobj, &group);
>>>>>> +    if (!drm_dev_is_unplugged(&adev->ddev))
>>>>>> +            sysfs_remove_group(&adev->dev->kobj, &group);
>>>>> This looks wrong. sysfs, like any other interface, should be
>>>>> unconditionally thrown out when we do the drm_dev_unregister. Whether
>>>>> hotunplugged or not should matter at all. Either this isn't needed at all,
>>>>> or something is wrong with the ordering here. But definitely fishy.
>>>>> -Daniel
>>>> So technically this is needed because kobejct's sysfs directory entry kobj->sd
>>>> is set to NULL
>>>> on device removal (from sysfs_remove_dir) but because we don't finalize the device
>>>> until last reference to drm file is dropped (which can happen later) we end up
>>>> calling sysfs_remove_file/dir after
>>>> this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
>>>> sysfs_remove_dir
>>>> is not and that why I guard against calls to sysfs_remove_dir.
>>>> But indeed the whole approach in the driver is incorrect, as Greg pointed out -
>>>> we should use
>>>> default groups attributes instead of explicit calls to sysfs interface and this
>>>> would save those troubles.
>>>> But again. the issue here of scope of work, converting all of amdgpu to default
>>>> groups attributes is somewhat
>>>> lengthy process with extra testing as the entire driver is papered with sysfs
>>>> references and seems to me more of a standalone
>>>> cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
>>>> that it makes more sense
>>>> to finalize and push the hot unplug patches so that this new functionality can
>>>> be part of the driver sooner
>>>> and then incrementally improve it by working on those other topics. Just as
>>>> devm_/drmm_ I also added sysfs cleanup
>>>> to my TODO list in the RFC patch.
>>> Hm, whether you solve this with the default group stuff to
>>> auto-remove, or remove explicitly at the right time doesn't matter
>>> much. The underlying problem you have here is that it's done way too
>>> late.
>> As far as I understood correctly the default group attrs by reading this
>> article by Greg - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linux.com%2Fnews%2Fhow-create-sysfs-file-correctly%2F&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3e993d1dfad746ffff2608d892e5bbb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637420862696611997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=HAlEqI6CYR3k1n9FFAibpjBlK7I7x9W23yd5CWJVYgM%3D&amp;reserved=0
>> it will be removed together with the device and not too late like now and I quote
>> from the last paragraph there:
>>
>> "By setting this value, you don’t have to do anything in your
>> probe() or release() functions at all in order for the
>> sysfs files to be properly created and destroyed whenever your
>> device is added or removed from the system. And you will, most
>> importantly, do it in a race-free manner, which is always a good thing."
>>
>> To me this seems like the best solution to the late remove issue. What do
>> you think ?
>>
>>
>>>    sysfs removal (like all uapi interfaces) need to be removed as
>>> part of drm_dev_unregister.
>>
>> Do you mean we need to trace and aggregate all sysfs files creation within
>> the low level drivers and then call some sysfs release function inside
>> drm_dev_unregister
>> to iterate and release them all ?
> That would just reinvent the proper solution Greg explained above. For now
> I think you just need some driver callback that you call right after
> drm_dev_unplug (or drm_dev_unregister) to clean up these sysfs interfaces.
> Afaiui the important part is to clean up your additional interfaces from
> the ->remove callback, since at that point the core sysfs stuff still
> exists.
>
> Maybe you want to do another loop over all IP blocks and a ->unregister
> callback, or maybe it's just 1-2 cases you call directly.


Most of them are barried within non ip block entites (e.g
amdgpu_device_fini->amdgpu_atombios_fini->amdgpu_atombios_fini->device_remove_file->sysfs_remove_file)
or much longer chain in kfd, like 
amdgpu_device_fini->.....kfd_remove_sysfs_node_entry->kfd_remove_sysfs_file->sysfs_remove_file
and so they will will need to be accessed explicitly by creating some accessors 
functions in their public API in multiple layers.


>
>>>    I guess aside from the split into fini_hw
>>> and fini_sw, you also need an unregister_late callback (like we have
>>> already for drm_connector, so that e.g. backlight and similar stuff
>>> can be unregistered).
>>
>> Is this the callback you suggest to call from within drm_dev_unregister and
>> it will be responsible to release all sysfs files created within the driver ?
> Nah that would be an amdgpu ip block callback (forgot what it's called,
> too comfy to fire up an editor right now and look it up, but you have a
> bunch of these loops all over).
>
> I think the core solution we want is what Greg already laid out. This idea
> here was just an amdgpu interim plan, if the core solution is a bit too
> invasive to implement right away.
> -Daniel


By what I showed above to me it looks that the interim solution might be not 
really less invasive then the right
solution by Greg and so if you feel that this is a blocker for the entire patch 
set  and we absolutely can't live
with the temporary band aid which this patch represents then I will just do the 
real solution as a standalone patch set
because I think this one is a big enough change on it's own to combine it with 
the hot device unplug topic.

Andrey


>
>> Andrey
>>
>>
>>> Papering over the underlying bug like this doesn't really fix much,
>>> the lifetimes are still wrong.
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>>>        return 0;
>>>>>>     }
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>>>> index 2b7c90b..54331fc 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>>>> @@ -24,6 +24,7 @@
>>>>>>     #include <linux/firmware.h>
>>>>>>     #include <linux/slab.h>
>>>>>>     #include <linux/module.h>
>>>>>> +#include <drm/drm_drv.h>
>>>>>>
>>>>>>     #include "amdgpu.h"
>>>>>>     #include "amdgpu_ucode.h"
>>>>>> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
>>>>>>
>>>>>>     void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
>>>>>>     {
>>>>>> -    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>>>>> +    if (!drm_dev_is_unplugged(&adev->ddev))
>>>>>> +            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>>>>>     }
>>>>>>
>>>>>>     static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
>>>>>> --
>>>>>> 2.7.4
>>>>>>
>>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug
@ 2020-11-27 15:34               ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-27 15:34 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Herring, amd-gfx list, Christian König, dri-devel,
	Anholt, Eric, Pekka Paalanen, Qiang Yu, Greg KH, Alex Deucher,
	Wentland, Harry, Lucas Stach


On 11/27/20 10:04 AM, Daniel Vetter wrote:
> On Wed, Nov 25, 2020 at 12:39:47PM -0500, Andrey Grodzovsky wrote:
>> On 11/25/20 4:04 AM, Daniel Vetter wrote:
>>> On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
>>> <Andrey.Grodzovsky@amd.com> wrote:
>>>> On 11/24/20 9:49 AM, Daniel Vetter wrote:
>>>>> On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
>>>>>> Avoids NULL ptr due to kobj->sd being unset on device removal.
>>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
>>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
>>>>>>     2 files changed, 6 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>>>> index caf828a..812e592 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
>>>>>> @@ -27,6 +27,7 @@
>>>>>>     #include <linux/uaccess.h>
>>>>>>     #include <linux/reboot.h>
>>>>>>     #include <linux/syscalls.h>
>>>>>> +#include <drm/drm_drv.h>
>>>>>>
>>>>>>     #include "amdgpu.h"
>>>>>>     #include "amdgpu_ras.h"
>>>>>> @@ -1043,7 +1044,8 @@ static int amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
>>>>>>                .attrs = attrs,
>>>>>>        };
>>>>>>
>>>>>> -    sysfs_remove_group(&adev->dev->kobj, &group);
>>>>>> +    if (!drm_dev_is_unplugged(&adev->ddev))
>>>>>> +            sysfs_remove_group(&adev->dev->kobj, &group);
>>>>> This looks wrong. sysfs, like any other interface, should be
>>>>> unconditionally thrown out when we do the drm_dev_unregister. Whether
>>>>> hotunplugged or not should matter at all. Either this isn't needed at all,
>>>>> or something is wrong with the ordering here. But definitely fishy.
>>>>> -Daniel
>>>> So technically this is needed because kobejct's sysfs directory entry kobj->sd
>>>> is set to NULL
>>>> on device removal (from sysfs_remove_dir) but because we don't finalize the device
>>>> until last reference to drm file is dropped (which can happen later) we end up
>>>> calling sysfs_remove_file/dir after
>>>> this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
>>>> sysfs_remove_dir
>>>> is not and that why I guard against calls to sysfs_remove_dir.
>>>> But indeed the whole approach in the driver is incorrect, as Greg pointed out -
>>>> we should use
>>>> default groups attributes instead of explicit calls to sysfs interface and this
>>>> would save those troubles.
>>>> But again. the issue here of scope of work, converting all of amdgpu to default
>>>> groups attributes is somewhat
>>>> lengthy process with extra testing as the entire driver is papered with sysfs
>>>> references and seems to me more of a standalone
>>>> cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
>>>> that it makes more sense
>>>> to finalize and push the hot unplug patches so that this new functionality can
>>>> be part of the driver sooner
>>>> and then incrementally improve it by working on those other topics. Just as
>>>> devm_/drmm_ I also added sysfs cleanup
>>>> to my TODO list in the RFC patch.
>>> Hm, whether you solve this with the default group stuff to
>>> auto-remove, or remove explicitly at the right time doesn't matter
>>> much. The underlying problem you have here is that it's done way too
>>> late.
>> As far as I understood correctly the default group attrs by reading this
>> article by Greg - https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linux.com%2Fnews%2Fhow-create-sysfs-file-correctly%2F&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3e993d1dfad746ffff2608d892e5bbb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637420862696611997%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=HAlEqI6CYR3k1n9FFAibpjBlK7I7x9W23yd5CWJVYgM%3D&amp;reserved=0
>> it will be removed together with the device and not too late like now and I quote
>> from the last paragraph there:
>>
>> "By setting this value, you don’t have to do anything in your
>> probe() or release() functions at all in order for the
>> sysfs files to be properly created and destroyed whenever your
>> device is added or removed from the system. And you will, most
>> importantly, do it in a race-free manner, which is always a good thing."
>>
>> To me this seems like the best solution to the late remove issue. What do
>> you think ?
>>
>>
>>>    sysfs removal (like all uapi interfaces) need to be removed as
>>> part of drm_dev_unregister.
>>
>> Do you mean we need to trace and aggregate all sysfs files creation within
>> the low level drivers and then call some sysfs release function inside
>> drm_dev_unregister
>> to iterate and release them all ?
> That would just reinvent the proper solution Greg explained above. For now
> I think you just need some driver callback that you call right after
> drm_dev_unplug (or drm_dev_unregister) to clean up these sysfs interfaces.
> Afaiui the important part is to clean up your additional interfaces from
> the ->remove callback, since at that point the core sysfs stuff still
> exists.
>
> Maybe you want to do another loop over all IP blocks and a ->unregister
> callback, or maybe it's just 1-2 cases you call directly.


Most of them are barried within non ip block entites (e.g
amdgpu_device_fini->amdgpu_atombios_fini->amdgpu_atombios_fini->device_remove_file->sysfs_remove_file)
or much longer chain in kfd, like 
amdgpu_device_fini->.....kfd_remove_sysfs_node_entry->kfd_remove_sysfs_file->sysfs_remove_file
and so they will will need to be accessed explicitly by creating some accessors 
functions in their public API in multiple layers.


>
>>>    I guess aside from the split into fini_hw
>>> and fini_sw, you also need an unregister_late callback (like we have
>>> already for drm_connector, so that e.g. backlight and similar stuff
>>> can be unregistered).
>>
>> Is this the callback you suggest to call from within drm_dev_unregister and
>> it will be responsible to release all sysfs files created within the driver ?
> Nah that would be an amdgpu ip block callback (forgot what it's called,
> too comfy to fire up an editor right now and look it up, but you have a
> bunch of these loops all over).
>
> I think the core solution we want is what Greg already laid out. This idea
> here was just an amdgpu interim plan, if the core solution is a bit too
> invasive to implement right away.
> -Daniel


By what I showed above to me it looks that the interim solution might be not 
really less invasive then the right
solution by Greg and so if you feel that this is a blocker for the entire patch 
set  and we absolutely can't live
with the temporary band aid which this patch represents then I will just do the 
real solution as a standalone patch set
because I think this one is a big enough change on it's own to combine it with 
the hot device unplug topic.

Andrey


>
>> Andrey
>>
>>
>>> Papering over the underlying bug like this doesn't really fix much,
>>> the lifetimes are still wrong.
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>>>        return 0;
>>>>>>     }
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>>>> index 2b7c90b..54331fc 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
>>>>>> @@ -24,6 +24,7 @@
>>>>>>     #include <linux/firmware.h>
>>>>>>     #include <linux/slab.h>
>>>>>>     #include <linux/module.h>
>>>>>> +#include <drm/drm_drv.h>
>>>>>>
>>>>>>     #include "amdgpu.h"
>>>>>>     #include "amdgpu_ucode.h"
>>>>>> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
>>>>>>
>>>>>>     void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
>>>>>>     {
>>>>>> -    sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>>>>> +    if (!drm_dev_is_unplugged(&adev->ddev))
>>>>>> +            sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
>>>>>>     }
>>>>>>
>>>>>>     static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
>>>>>> --
>>>>>> 2.7.4
>>>>>>
>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-27 14:59                               ` Daniel Vetter
@ 2020-11-27 16:04                                 ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-27 16:04 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher,
	yuq825, christian.koenig


On 11/27/20 9:59 AM, Daniel Vetter wrote:
> On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote:
>> On 11/25/20 11:36 AM, Daniel Vetter wrote:
>>> On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
>>>> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
>>>>> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
>>>>>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>>>>>> On 11/24/20 2:41 AM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>>>>>> It would be cleaner if we could do the whole
>>>>>>>>>>>>>> handling in TTM. I also need to double check
>>>>>>>>>>>>>> what you are doing with this function.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology
>>>>>>>>>>>>> notifier per device." to see
>>>>>>>>>>>>> how i use it. I don't see why this should go
>>>>>>>>>>>>> into TTM mid-layer - the stuff I do inside
>>>>>>>>>>>>> is vendor specific and also I don't think TTM is
>>>>>>>>>>>>> explicitly aware of IOMMU ?
>>>>>>>>>>>>> Do you mean you prefer the IOMMU notifier to be
>>>>>>>>>>>>> registered from within TTM
>>>>>>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>>>>>> No, that is really vendor specific.
>>>>>>>>>>>>
>>>>>>>>>>>> What I meant is to have a function like
>>>>>>>>>>>> ttm_resource_manager_evict_all() which you only need
>>>>>>>>>>>> to call and all tt objects are unpopulated.
>>>>>>>>>>> So instead of this BO list i create and later iterate in
>>>>>>>>>>> amdgpu from the IOMMU patch you just want to do it
>>>>>>>>>>> within
>>>>>>>>>>> TTM with a single function ? Makes much more sense.
>>>>>>>>>> Yes, exactly.
>>>>>>>>>>
>>>>>>>>>> The list_empty() checks we have in TTM for the LRU are
>>>>>>>>>> actually not the best idea, we should now check the
>>>>>>>>>> pin_count instead. This way we could also have a list of the
>>>>>>>>>> pinned BOs in TTM.
>>>>>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for
>>>>>>>>> the unpinned BOs and this new function for the pinned ones  ?
>>>>>>>>> It's probably a good idea to combine both iterations into this
>>>>>>>>> new function to cover all the BOs allocated on the device.
>>>>>>>> Yes, that's what I had in my mind as well.
>>>>>>>>
>>>>>>>>>> BTW: Have you thought about what happens when we unpopulate
>>>>>>>>>> a BO while we still try to use a kernel mapping for it? That
>>>>>>>>>> could have unforeseen consequences.
>>>>>>>>> Are you asking what happens to kmap or vmap style mapped CPU
>>>>>>>>> accesses once we drop all the DMA backing pages for a particular
>>>>>>>>> BO ? Because for user mappings
>>>>>>>>> (mmap) we took care of this with dummy page reroute but indeed
>>>>>>>>> nothing was done for in kernel CPU mappings.
>>>>>>>> Yes exactly that.
>>>>>>>>
>>>>>>>> In other words what happens if we free the ring buffer while the
>>>>>>>> kernel still writes to it?
>>>>>>>>
>>>>>>>> Christian.
>>>>>>> While we can't control user application accesses to the mapped buffers
>>>>>>> explicitly and hence we use page fault rerouting
>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>> Yes, I fear we are going to need that.
>>>>> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
>>>>> could stuff this into begin/end_cpu_access
>>
>> Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
>> driver specific hook ?
>>
>>
>>>>> (but only for the kernel, so a
>>>>> bit tricky)?
>>
>> Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
>> some user process ? And  if we do need this distinction I think we should be able to
>> differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL
>> for kernel thread.
> Userspace mmap is handled by punching out the pte. So we don't need to do
> anything special there.
>
> For kernel mmap the begin/end should be all in the same context (so we
> could use the srcu lock that works underneath drm_dev_enter/exit), since
> at least right now kernel vmaps of dma-buf are very long-lived.


If by same context you mean the right drm_device (the exporter's one)
then this should be ok as I am seeing from amdgpu implementation
of the callback - amdgpu_dma_buf_begin_cpu_access. We just need to add
handler for .end_cpu_access callback to call drm_dev_exit there.

Andrey


>
> But the good news is that Thomas Zimmerman is working on this problem
> already for different reasons, so it might be that we won't have any
> long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in
> there.
>
>>>> Oh very very good point! I haven't thought about DMA-buf mmaps in this
>>>> context yet.
>>>>
>>>>
>>>>> btw the other issue with dma-buf (and even worse with dma_fence) is
>>>>> refcounting of the underlying drm_device. I'd expect that all your
>>>>> callbacks go boom if the dma_buf outlives your drm_device. That part isn't
>>>>> yet solved in your series here.
>>>> Well thinking more about this, it seems to be a another really good argument
>>>> why mapping pages from DMA-bufs into application address space directly is a
>>>> very bad idea :)
>>>>
>>>> But yes, we essentially can't remove the device as long as there is a
>>>> DMA-buf with mappings. No idea how to clean that one up.
>>> drm_dev_get/put in drm_prime helpers should get us like 90% there I think.
>>
>> What are the other 10% ?
> dma_fence, which is also about 90% of the work probably. But I'm
> guesstimating only 10% of the oopses you can hit. Since generally the
> dma_fence for a buffer don't outlive the underlying buffer. So usually no
> problems happen when we've solved the dma-buf sharing, but the dma_fence
> can outlive the dma-buf, so there's still possibilities of crashing.
>
>>> The even more worrying thing is random dma_fence attached to the dma_resv
>>> object. We could try to clean all of ours up, but they could have escaped
>>> already into some other driver. And since we're talking about egpu
>>> hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
>>>
>>> I have no how to fix that one :-/
>>> -Daniel
>>
>> I assume you are referring to sync_file_create/sync_file_get_fence API  for
>> dma_fence export/import ?
> So dma_fence is a general issue, there's a pile of interfaces that result
> in sharing with other drivers:
> - dma_resv in the dma_buf
> - sync_file
> - drm_syncobj (but I think that's not yet cross driver, but probably
>    changes)
>
> In each of these cases drivers can pick up the dma_fence and use it
> internally for all kinds of purposes (could end up in the scheduler or
> wherever).
>
>> So with DMA bufs we have the drm_gem_object as exporter specific private data
>> and so we can do drm_dev_get and put at the drm_gem_object layer to bind
>> device life cycle
>> to that of each GEM object but, we don't have such mid-layer for dma_fence
>> which could allow
>> us to increment device reference for each fence out there related to that
>> device - is my understanding correct ?
> Yeah that's the annoying part with dma-fence. No existing generic place to
> put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the
> other problems first.
> -Daniel
>
>> Andrey
>>
>>
>> Andrey
>>
>>
>>>> Christian.
>>>>
>>>>> -Daniel
>>>>>
>>>>>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ?
>>>>>>> Is there other places ?
>>>>>> Puh, good question. I have no idea.
>>>>>>
>>>>>>> Another point is that at this point the driver shouldn't access any such
>>>>>>> buffers as we are at the process finishing the device.
>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I don't
>>>>>>> think there is anything else to do ?
>>>>>> Well there is a page fault handler for kernel mappings, but that one just
>>>>>> prints the stack trace into the system log and calls BUG(); :)
>>>>>>
>>>>>> Long story short we need to avoid any access to released pages after unplug.
>>>>>> No matter if it's from the kernel or userspace.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> Andrey
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-27 16:04                                 ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-11-27 16:04 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx, gregkh,
	Alexander.Deucher, yuq825, Harry.Wentland, christian.koenig,
	l.stach


On 11/27/20 9:59 AM, Daniel Vetter wrote:
> On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote:
>> On 11/25/20 11:36 AM, Daniel Vetter wrote:
>>> On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
>>>> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
>>>>> On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
>>>>>> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>>>>>> On 11/24/20 2:41 AM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>>>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>>>>>> It would be cleaner if we could do the whole
>>>>>>>>>>>>>> handling in TTM. I also need to double check
>>>>>>>>>>>>>> what you are doing with this function.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology
>>>>>>>>>>>>> notifier per device." to see
>>>>>>>>>>>>> how i use it. I don't see why this should go
>>>>>>>>>>>>> into TTM mid-layer - the stuff I do inside
>>>>>>>>>>>>> is vendor specific and also I don't think TTM is
>>>>>>>>>>>>> explicitly aware of IOMMU ?
>>>>>>>>>>>>> Do you mean you prefer the IOMMU notifier to be
>>>>>>>>>>>>> registered from within TTM
>>>>>>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>>>>>> No, that is really vendor specific.
>>>>>>>>>>>>
>>>>>>>>>>>> What I meant is to have a function like
>>>>>>>>>>>> ttm_resource_manager_evict_all() which you only need
>>>>>>>>>>>> to call and all tt objects are unpopulated.
>>>>>>>>>>> So instead of this BO list i create and later iterate in
>>>>>>>>>>> amdgpu from the IOMMU patch you just want to do it
>>>>>>>>>>> within
>>>>>>>>>>> TTM with a single function ? Makes much more sense.
>>>>>>>>>> Yes, exactly.
>>>>>>>>>>
>>>>>>>>>> The list_empty() checks we have in TTM for the LRU are
>>>>>>>>>> actually not the best idea, we should now check the
>>>>>>>>>> pin_count instead. This way we could also have a list of the
>>>>>>>>>> pinned BOs in TTM.
>>>>>>>>> So from my IOMMU topology handler I will iterate the TTM LRU for
>>>>>>>>> the unpinned BOs and this new function for the pinned ones  ?
>>>>>>>>> It's probably a good idea to combine both iterations into this
>>>>>>>>> new function to cover all the BOs allocated on the device.
>>>>>>>> Yes, that's what I had in my mind as well.
>>>>>>>>
>>>>>>>>>> BTW: Have you thought about what happens when we unpopulate
>>>>>>>>>> a BO while we still try to use a kernel mapping for it? That
>>>>>>>>>> could have unforeseen consequences.
>>>>>>>>> Are you asking what happens to kmap or vmap style mapped CPU
>>>>>>>>> accesses once we drop all the DMA backing pages for a particular
>>>>>>>>> BO ? Because for user mappings
>>>>>>>>> (mmap) we took care of this with dummy page reroute but indeed
>>>>>>>>> nothing was done for in kernel CPU mappings.
>>>>>>>> Yes exactly that.
>>>>>>>>
>>>>>>>> In other words what happens if we free the ring buffer while the
>>>>>>>> kernel still writes to it?
>>>>>>>>
>>>>>>>> Christian.
>>>>>>> While we can't control user application accesses to the mapped buffers
>>>>>>> explicitly and hence we use page fault rerouting
>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>> Yes, I fear we are going to need that.
>>>>> Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
>>>>> could stuff this into begin/end_cpu_access
>>
>> Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
>> driver specific hook ?
>>
>>
>>>>> (but only for the kernel, so a
>>>>> bit tricky)?
>>
>> Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
>> some user process ? And  if we do need this distinction I think we should be able to
>> differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL
>> for kernel thread.
> Userspace mmap is handled by punching out the pte. So we don't need to do
> anything special there.
>
> For kernel mmap the begin/end should be all in the same context (so we
> could use the srcu lock that works underneath drm_dev_enter/exit), since
> at least right now kernel vmaps of dma-buf are very long-lived.


If by same context you mean the right drm_device (the exporter's one)
then this should be ok as I am seeing from amdgpu implementation
of the callback - amdgpu_dma_buf_begin_cpu_access. We just need to add
handler for .end_cpu_access callback to call drm_dev_exit there.

Andrey


>
> But the good news is that Thomas Zimmerman is working on this problem
> already for different reasons, so it might be that we won't have any
> long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in
> there.
>
>>>> Oh very very good point! I haven't thought about DMA-buf mmaps in this
>>>> context yet.
>>>>
>>>>
>>>>> btw the other issue with dma-buf (and even worse with dma_fence) is
>>>>> refcounting of the underlying drm_device. I'd expect that all your
>>>>> callbacks go boom if the dma_buf outlives your drm_device. That part isn't
>>>>> yet solved in your series here.
>>>> Well thinking more about this, it seems to be a another really good argument
>>>> why mapping pages from DMA-bufs into application address space directly is a
>>>> very bad idea :)
>>>>
>>>> But yes, we essentially can't remove the device as long as there is a
>>>> DMA-buf with mappings. No idea how to clean that one up.
>>> drm_dev_get/put in drm_prime helpers should get us like 90% there I think.
>>
>> What are the other 10% ?
> dma_fence, which is also about 90% of the work probably. But I'm
> guesstimating only 10% of the oopses you can hit. Since generally the
> dma_fence for a buffer don't outlive the underlying buffer. So usually no
> problems happen when we've solved the dma-buf sharing, but the dma_fence
> can outlive the dma-buf, so there's still possibilities of crashing.
>
>>> The even more worrying thing is random dma_fence attached to the dma_resv
>>> object. We could try to clean all of ours up, but they could have escaped
>>> already into some other driver. And since we're talking about egpu
>>> hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
>>>
>>> I have no how to fix that one :-/
>>> -Daniel
>>
>> I assume you are referring to sync_file_create/sync_file_get_fence API  for
>> dma_fence export/import ?
> So dma_fence is a general issue, there's a pile of interfaces that result
> in sharing with other drivers:
> - dma_resv in the dma_buf
> - sync_file
> - drm_syncobj (but I think that's not yet cross driver, but probably
>    changes)
>
> In each of these cases drivers can pick up the dma_fence and use it
> internally for all kinds of purposes (could end up in the scheduler or
> wherever).
>
>> So with DMA bufs we have the drm_gem_object as exporter specific private data
>> and so we can do drm_dev_get and put at the drm_gem_object layer to bind
>> device life cycle
>> to that of each GEM object but, we don't have such mid-layer for dma_fence
>> which could allow
>> us to increment device reference for each fence out there related to that
>> device - is my understanding correct ?
> Yeah that's the annoying part with dma-fence. No existing generic place to
> put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the
> other problems first.
> -Daniel
>
>> Andrey
>>
>>
>> Andrey
>>
>>
>>>> Christian.
>>>>
>>>>> -Daniel
>>>>>
>>>>>>> Things like CPU page table updates, ring buffer accesses and FW memcpy ?
>>>>>>> Is there other places ?
>>>>>> Puh, good question. I have no idea.
>>>>>>
>>>>>>> Another point is that at this point the driver shouldn't access any such
>>>>>>> buffers as we are at the process finishing the device.
>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I don't
>>>>>>> think there is anything else to do ?
>>>>>> Well there is a page fault handler for kernel mappings, but that one just
>>>>>> prints the stack trace into the system log and calls BUG(); :)
>>>>>>
>>>>>> Long story short we need to avoid any access to released pages after unplug.
>>>>>> No matter if it's from the kernel or userspace.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> Andrey
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-27 16:04                                 ` Andrey Grodzovsky
@ 2020-11-30 14:15                                   ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-30 14:15 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher,
	yuq825, christian.koenig

On Fri, Nov 27, 2020 at 11:04:55AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/27/20 9:59 AM, Daniel Vetter wrote:
> > On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote:
> > > On 11/25/20 11:36 AM, Daniel Vetter wrote:
> > > > On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
> > > > > Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> > > > > > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> > > > > > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > > > > > > > On 11/24/20 2:41 AM, Christian König wrote:
> > > > > > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > > > > > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > > > > > > > what you are doing with this function.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Christian.
> > > > > > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > > > > > > > notifier per device." to see
> > > > > > > > > > > > > > how i use it. I don't see why this should go
> > > > > > > > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > > > > > > > registered from within TTM
> > > > > > > > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > > > > > > > No, that is really vendor specific.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > What I meant is to have a function like
> > > > > > > > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > > > > > > > to call and all tt objects are unpopulated.
> > > > > > > > > > > > So instead of this BO list i create and later iterate in
> > > > > > > > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > > > > > > > within
> > > > > > > > > > > > TTM with a single function ? Makes much more sense.
> > > > > > > > > > > Yes, exactly.
> > > > > > > > > > > 
> > > > > > > > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > > > > > > > actually not the best idea, we should now check the
> > > > > > > > > > > pin_count instead. This way we could also have a list of the
> > > > > > > > > > > pinned BOs in TTM.
> > > > > > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > > > > > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > > > > > > > It's probably a good idea to combine both iterations into this
> > > > > > > > > > new function to cover all the BOs allocated on the device.
> > > > > > > > > Yes, that's what I had in my mind as well.
> > > > > > > > > 
> > > > > > > > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > > > > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > > > > > > > could have unforeseen consequences.
> > > > > > > > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > > > > > > > accesses once we drop all the DMA backing pages for a particular
> > > > > > > > > > BO ? Because for user mappings
> > > > > > > > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > > > > > > > nothing was done for in kernel CPU mappings.
> > > > > > > > > Yes exactly that.
> > > > > > > > > 
> > > > > > > > > In other words what happens if we free the ring buffer while the
> > > > > > > > > kernel still writes to it?
> > > > > > > > > 
> > > > > > > > > Christian.
> > > > > > > > While we can't control user application accesses to the mapped buffers
> > > > > > > > explicitly and hence we use page fault rerouting
> > > > > > > > I am thinking that in this  case we may be able to sprinkle
> > > > > > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > > > > > CPU access a DMA buffer from the kernel ?
> > > > > > > Yes, I fear we are going to need that.
> > > > > > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> > > > > > could stuff this into begin/end_cpu_access
> > > 
> > > Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
> > > driver specific hook ?
> > > 
> > > 
> > > > > > (but only for the kernel, so a
> > > > > > bit tricky)?
> > > 
> > > Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
> > > some user process ? And  if we do need this distinction I think we should be able to
> > > differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL
> > > for kernel thread.
> > Userspace mmap is handled by punching out the pte. So we don't need to do
> > anything special there.
> > 
> > For kernel mmap the begin/end should be all in the same context (so we
> > could use the srcu lock that works underneath drm_dev_enter/exit), since
> > at least right now kernel vmaps of dma-buf are very long-lived.
> 
> 
> If by same context you mean the right drm_device (the exporter's one)
> then this should be ok as I am seeing from amdgpu implementation
> of the callback - amdgpu_dma_buf_begin_cpu_access. We just need to add
> handler for .end_cpu_access callback to call drm_dev_exit there.

Same context = same system call essentially. You cannot hold locks while
returning to userspace. And current userspace can call the
begin/end_cpu_access callbacks through ioctls, so just putting a
drm_dev_enter/exit in them will break really badly. Iirc there's an igt
also for testing these ioctl - if there isn't we really should have one.

Hence why we need to be more careful here about how's calling and where we
can put the drm_dev_enter/exit.
-Daniel

> 
> Andrey
> 
> 
> > 
> > But the good news is that Thomas Zimmerman is working on this problem
> > already for different reasons, so it might be that we won't have any
> > long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in
> > there.
> > 
> > > > > Oh very very good point! I haven't thought about DMA-buf mmaps in this
> > > > > context yet.
> > > > > 
> > > > > 
> > > > > > btw the other issue with dma-buf (and even worse with dma_fence) is
> > > > > > refcounting of the underlying drm_device. I'd expect that all your
> > > > > > callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> > > > > > yet solved in your series here.
> > > > > Well thinking more about this, it seems to be a another really good argument
> > > > > why mapping pages from DMA-bufs into application address space directly is a
> > > > > very bad idea :)
> > > > > 
> > > > > But yes, we essentially can't remove the device as long as there is a
> > > > > DMA-buf with mappings. No idea how to clean that one up.
> > > > drm_dev_get/put in drm_prime helpers should get us like 90% there I think.
> > > 
> > > What are the other 10% ?
> > dma_fence, which is also about 90% of the work probably. But I'm
> > guesstimating only 10% of the oopses you can hit. Since generally the
> > dma_fence for a buffer don't outlive the underlying buffer. So usually no
> > problems happen when we've solved the dma-buf sharing, but the dma_fence
> > can outlive the dma-buf, so there's still possibilities of crashing.
> > 
> > > > The even more worrying thing is random dma_fence attached to the dma_resv
> > > > object. We could try to clean all of ours up, but they could have escaped
> > > > already into some other driver. And since we're talking about egpu
> > > > hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
> > > > 
> > > > I have no how to fix that one :-/
> > > > -Daniel
> > > 
> > > I assume you are referring to sync_file_create/sync_file_get_fence API  for
> > > dma_fence export/import ?
> > So dma_fence is a general issue, there's a pile of interfaces that result
> > in sharing with other drivers:
> > - dma_resv in the dma_buf
> > - sync_file
> > - drm_syncobj (but I think that's not yet cross driver, but probably
> >    changes)
> > 
> > In each of these cases drivers can pick up the dma_fence and use it
> > internally for all kinds of purposes (could end up in the scheduler or
> > wherever).
> > 
> > > So with DMA bufs we have the drm_gem_object as exporter specific private data
> > > and so we can do drm_dev_get and put at the drm_gem_object layer to bind
> > > device life cycle
> > > to that of each GEM object but, we don't have such mid-layer for dma_fence
> > > which could allow
> > > us to increment device reference for each fence out there related to that
> > > device - is my understanding correct ?
> > Yeah that's the annoying part with dma-fence. No existing generic place to
> > put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the
> > other problems first.
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > Andrey
> > > 
> > > 
> > > > > Christian.
> > > > > 
> > > > > > -Daniel
> > > > > > 
> > > > > > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > > > > > > > Is there other places ?
> > > > > > > Puh, good question. I have no idea.
> > > > > > > 
> > > > > > > > Another point is that at this point the driver shouldn't access any such
> > > > > > > > buffers as we are at the process finishing the device.
> > > > > > > > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > > > > > > > think there is anything else to do ?
> > > > > > > Well there is a page fault handler for kernel mappings, but that one just
> > > > > > > prints the stack trace into the system log and calls BUG(); :)
> > > > > > > 
> > > > > > > Long story short we need to avoid any access to released pages after unplug.
> > > > > > > No matter if it's from the kernel or userspace.
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > > Andrey

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-11-30 14:15                                   ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-11-30 14:15 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx,
	Daniel Vetter, gregkh, Alexander.Deucher, yuq825, Harry.Wentland,
	christian.koenig, l.stach

On Fri, Nov 27, 2020 at 11:04:55AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/27/20 9:59 AM, Daniel Vetter wrote:
> > On Wed, Nov 25, 2020 at 02:34:44PM -0500, Andrey Grodzovsky wrote:
> > > On 11/25/20 11:36 AM, Daniel Vetter wrote:
> > > > On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
> > > > > Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> > > > > > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> > > > > > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > > > > > > > On 11/24/20 2:41 AM, Christian König wrote:
> > > > > > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > > > > > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > > > > > > > what you are doing with this function.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Christian.
> > > > > > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > > > > > > > notifier per device." to see
> > > > > > > > > > > > > > how i use it. I don't see why this should go
> > > > > > > > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > > > > > > > registered from within TTM
> > > > > > > > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > > > > > > > No, that is really vendor specific.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > What I meant is to have a function like
> > > > > > > > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > > > > > > > to call and all tt objects are unpopulated.
> > > > > > > > > > > > So instead of this BO list i create and later iterate in
> > > > > > > > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > > > > > > > within
> > > > > > > > > > > > TTM with a single function ? Makes much more sense.
> > > > > > > > > > > Yes, exactly.
> > > > > > > > > > > 
> > > > > > > > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > > > > > > > actually not the best idea, we should now check the
> > > > > > > > > > > pin_count instead. This way we could also have a list of the
> > > > > > > > > > > pinned BOs in TTM.
> > > > > > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > > > > > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > > > > > > > It's probably a good idea to combine both iterations into this
> > > > > > > > > > new function to cover all the BOs allocated on the device.
> > > > > > > > > Yes, that's what I had in my mind as well.
> > > > > > > > > 
> > > > > > > > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > > > > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > > > > > > > could have unforeseen consequences.
> > > > > > > > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > > > > > > > accesses once we drop all the DMA backing pages for a particular
> > > > > > > > > > BO ? Because for user mappings
> > > > > > > > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > > > > > > > nothing was done for in kernel CPU mappings.
> > > > > > > > > Yes exactly that.
> > > > > > > > > 
> > > > > > > > > In other words what happens if we free the ring buffer while the
> > > > > > > > > kernel still writes to it?
> > > > > > > > > 
> > > > > > > > > Christian.
> > > > > > > > While we can't control user application accesses to the mapped buffers
> > > > > > > > explicitly and hence we use page fault rerouting
> > > > > > > > I am thinking that in this  case we may be able to sprinkle
> > > > > > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > > > > > CPU access a DMA buffer from the kernel ?
> > > > > > > Yes, I fear we are going to need that.
> > > > > > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> > > > > > could stuff this into begin/end_cpu_access
> > > 
> > > Do you mean guarding with drm_dev_enter/exit in dma_buf_ops.begin/end_cpu_access
> > > driver specific hook ?
> > > 
> > > 
> > > > > > (but only for the kernel, so a
> > > > > > bit tricky)?
> > > 
> > > Why only kernel ? Why is it a problem to do it if it comes from dma_buf_ioctl by
> > > some user process ? And  if we do need this distinction I think we should be able to
> > > differentiate by looking at current->mm (i.e. mm_struct) pointer being NULL
> > > for kernel thread.
> > Userspace mmap is handled by punching out the pte. So we don't need to do
> > anything special there.
> > 
> > For kernel mmap the begin/end should be all in the same context (so we
> > could use the srcu lock that works underneath drm_dev_enter/exit), since
> > at least right now kernel vmaps of dma-buf are very long-lived.
> 
> 
> If by same context you mean the right drm_device (the exporter's one)
> then this should be ok as I am seeing from amdgpu implementation
> of the callback - amdgpu_dma_buf_begin_cpu_access. We just need to add
> handler for .end_cpu_access callback to call drm_dev_exit there.

Same context = same system call essentially. You cannot hold locks while
returning to userspace. And current userspace can call the
begin/end_cpu_access callbacks through ioctls, so just putting a
drm_dev_enter/exit in them will break really badly. Iirc there's an igt
also for testing these ioctl - if there isn't we really should have one.

Hence why we need to be more careful here about how's calling and where we
can put the drm_dev_enter/exit.
-Daniel

> 
> Andrey
> 
> 
> > 
> > But the good news is that Thomas Zimmerman is working on this problem
> > already for different reasons, so it might be that we won't have any
> > long-lived kernel vmap anymore. And we could put the drm_dev_enter/exit in
> > there.
> > 
> > > > > Oh very very good point! I haven't thought about DMA-buf mmaps in this
> > > > > context yet.
> > > > > 
> > > > > 
> > > > > > btw the other issue with dma-buf (and even worse with dma_fence) is
> > > > > > refcounting of the underlying drm_device. I'd expect that all your
> > > > > > callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> > > > > > yet solved in your series here.
> > > > > Well thinking more about this, it seems to be a another really good argument
> > > > > why mapping pages from DMA-bufs into application address space directly is a
> > > > > very bad idea :)
> > > > > 
> > > > > But yes, we essentially can't remove the device as long as there is a
> > > > > DMA-buf with mappings. No idea how to clean that one up.
> > > > drm_dev_get/put in drm_prime helpers should get us like 90% there I think.
> > > 
> > > What are the other 10% ?
> > dma_fence, which is also about 90% of the work probably. But I'm
> > guesstimating only 10% of the oopses you can hit. Since generally the
> > dma_fence for a buffer don't outlive the underlying buffer. So usually no
> > problems happen when we've solved the dma-buf sharing, but the dma_fence
> > can outlive the dma-buf, so there's still possibilities of crashing.
> > 
> > > > The even more worrying thing is random dma_fence attached to the dma_resv
> > > > object. We could try to clean all of ours up, but they could have escaped
> > > > already into some other driver. And since we're talking about egpu
> > > > hotunplug, dma_fence escaping to the igpu is a pretty reasonable use-case.
> > > > 
> > > > I have no how to fix that one :-/
> > > > -Daniel
> > > 
> > > I assume you are referring to sync_file_create/sync_file_get_fence API  for
> > > dma_fence export/import ?
> > So dma_fence is a general issue, there's a pile of interfaces that result
> > in sharing with other drivers:
> > - dma_resv in the dma_buf
> > - sync_file
> > - drm_syncobj (but I think that's not yet cross driver, but probably
> >    changes)
> > 
> > In each of these cases drivers can pick up the dma_fence and use it
> > internally for all kinds of purposes (could end up in the scheduler or
> > wherever).
> > 
> > > So with DMA bufs we have the drm_gem_object as exporter specific private data
> > > and so we can do drm_dev_get and put at the drm_gem_object layer to bind
> > > device life cycle
> > > to that of each GEM object but, we don't have such mid-layer for dma_fence
> > > which could allow
> > > us to increment device reference for each fence out there related to that
> > > device - is my understanding correct ?
> > Yeah that's the annoying part with dma-fence. No existing generic place to
> > put the drm_dev_get/put. tbf I'd note this as a todo and try to solve the
> > other problems first.
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > Andrey
> > > 
> > > 
> > > > > Christian.
> > > > > 
> > > > > > -Daniel
> > > > > > 
> > > > > > > > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > > > > > > > Is there other places ?
> > > > > > > Puh, good question. I have no idea.
> > > > > > > 
> > > > > > > > Another point is that at this point the driver shouldn't access any such
> > > > > > > > buffers as we are at the process finishing the device.
> > > > > > > > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > > > > > > > think there is anything else to do ?
> > > > > > > Well there is a page fault handler for kernel mappings, but that one just
> > > > > > > prints the stack trace into the system log and calls BUG(); :)
> > > > > > > 
> > > > > > > Long story short we need to avoid any access to released pages after unplug.
> > > > > > > No matter if it's from the kernel or userspace.
> > > > > > > 
> > > > > > > Regards,
> > > > > > > Christian.
> > > > > > > 
> > > > > > > > Andrey

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-11-24 16:44                     ` Christian König
@ 2020-12-15 20:18                       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-15 20:18 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/24/20 11:44 AM, Christian König wrote:
> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>
>> On 11/24/20 2:41 AM, Christian König wrote:
>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>
>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>
>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>
>>>>>>>>> It would be cleaner if we could do the whole handling in TTM. I also 
>>>>>>>>> need to double check what you are doing with this function.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>
>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." 
>>>>>>>> to see
>>>>>>>> how i use it. I don't see why this should go into TTM mid-layer - the 
>>>>>>>> stuff I do inside
>>>>>>>> is vendor specific and also I don't think TTM is explicitly aware of 
>>>>>>>> IOMMU ?
>>>>>>>> Do you mean you prefer the IOMMU notifier to be registered from within TTM
>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>
>>>>>>> No, that is really vendor specific.
>>>>>>>
>>>>>>> What I meant is to have a function like ttm_resource_manager_evict_all() 
>>>>>>> which you only need to call and all tt objects are unpopulated.
>>>>>>
>>>>>>
>>>>>> So instead of this BO list i create and later iterate in amdgpu from the 
>>>>>> IOMMU patch you just want to do it within
>>>>>> TTM with a single function ? Makes much more sense.
>>>>>
>>>>> Yes, exactly.
>>>>>
>>>>> The list_empty() checks we have in TTM for the LRU are actually not the 
>>>>> best idea, we should now check the pin_count instead. This way we could 
>>>>> also have a list of the pinned BOs in TTM.
>>>>
>>>>
>>>> So from my IOMMU topology handler I will iterate the TTM LRU for the 
>>>> unpinned BOs and this new function for the pinned ones  ?
>>>> It's probably a good idea to combine both iterations into this new function 
>>>> to cover all the BOs allocated on the device.
>>>
>>> Yes, that's what I had in my mind as well.
>>>
>>>>
>>>>
>>>>>
>>>>> BTW: Have you thought about what happens when we unpopulate a BO while we 
>>>>> still try to use a kernel mapping for it? That could have unforeseen 
>>>>> consequences.
>>>>
>>>>
>>>> Are you asking what happens to kmap or vmap style mapped CPU accesses once 
>>>> we drop all the DMA backing pages for a particular BO ? Because for user 
>>>> mappings
>>>> (mmap) we took care of this with dummy page reroute but indeed nothing was 
>>>> done for in kernel CPU mappings.
>>>
>>> Yes exactly that.
>>>
>>> In other words what happens if we free the ring buffer while the kernel 
>>> still writes to it?
>>>
>>> Christian.
>>
>>
>> While we can't control user application accesses to the mapped buffers 
>> explicitly and hence we use page fault rerouting
>> I am thinking that in this  case we may be able to sprinkle 
>> drm_dev_enter/exit in any such sensitive place were we might
>> CPU access a DMA buffer from the kernel ?
>
> Yes, I fear we are going to need that.
>
>> Things like CPU page table updates, ring buffer accesses and FW memcpy ? Is 
>> there other places ?
>
> Puh, good question. I have no idea.
>
>> Another point is that at this point the driver shouldn't access any such 
>> buffers as we are at the process finishing the device.
>> AFAIK there is no page fault mechanism for kernel mappings so I don't think 
>> there is anything else to do ?
>
> Well there is a page fault handler for kernel mappings, but that one just 
> prints the stack trace into the system log and calls BUG(); :)
>
> Long story short we need to avoid any access to released pages after unplug. 
> No matter if it's from the kernel or userspace.


I was just about to start guarding with drm_dev_enter/exit CPU accesses from 
kernel to GTT ot VRAM buffers but then i looked more in the code
and seems like ttm_tt_unpopulate just deletes DMA mappings (for the sake of 
device to main memory access). Kernel page table is not touched
until last bo refcount is dropped and the bo is released 
(ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This is both
for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped by 
ioremap. So as i see it, nothing will bad will happen after we
unpopulate a BO while we still try to use a kernel mapping for it, system memory 
pages backing GTT BOs are still mapped and not freed and for
VRAM BOs same is for the IO physical ranges mapped into the kernel page table 
since iounmap wasn't called yet. I loaded the driver with vm_update_mode=3
meaning all VM updates done using CPU and hasn't seen any OOPs after removing 
the device. I guess i can test it more by allocating GTT and VRAM BOs
and trying to read/write to them after device is removed.

Andrey


>
> Regards,
> Christian.
>
>>
>> Andrey
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-15 20:18                       ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-15 20:18 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland


On 11/24/20 11:44 AM, Christian König wrote:
> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
>>
>> On 11/24/20 2:41 AM, Christian König wrote:
>>> Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
>>>>
>>>> On 11/23/20 3:41 PM, Christian König wrote:
>>>>> Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> On 11/23/20 3:20 PM, Christian König wrote:
>>>>>>> Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
>>>>>>>>
>>>>>>>> On 11/25/20 5:42 AM, Christian König wrote:
>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>> It's needed to drop iommu backed pages on device unplug
>>>>>>>>>> before device's IOMMU group is released.
>>>>>>>>>
>>>>>>>>> It would be cleaner if we could do the whole handling in TTM. I also 
>>>>>>>>> need to double check what you are doing with this function.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>
>>>>>>>> Check patch "drm/amdgpu: Register IOMMU topology notifier per device." 
>>>>>>>> to see
>>>>>>>> how i use it. I don't see why this should go into TTM mid-layer - the 
>>>>>>>> stuff I do inside
>>>>>>>> is vendor specific and also I don't think TTM is explicitly aware of 
>>>>>>>> IOMMU ?
>>>>>>>> Do you mean you prefer the IOMMU notifier to be registered from within TTM
>>>>>>>> and then use a hook to call into vendor specific handler ?
>>>>>>>
>>>>>>> No, that is really vendor specific.
>>>>>>>
>>>>>>> What I meant is to have a function like ttm_resource_manager_evict_all() 
>>>>>>> which you only need to call and all tt objects are unpopulated.
>>>>>>
>>>>>>
>>>>>> So instead of this BO list i create and later iterate in amdgpu from the 
>>>>>> IOMMU patch you just want to do it within
>>>>>> TTM with a single function ? Makes much more sense.
>>>>>
>>>>> Yes, exactly.
>>>>>
>>>>> The list_empty() checks we have in TTM for the LRU are actually not the 
>>>>> best idea, we should now check the pin_count instead. This way we could 
>>>>> also have a list of the pinned BOs in TTM.
>>>>
>>>>
>>>> So from my IOMMU topology handler I will iterate the TTM LRU for the 
>>>> unpinned BOs and this new function for the pinned ones  ?
>>>> It's probably a good idea to combine both iterations into this new function 
>>>> to cover all the BOs allocated on the device.
>>>
>>> Yes, that's what I had in my mind as well.
>>>
>>>>
>>>>
>>>>>
>>>>> BTW: Have you thought about what happens when we unpopulate a BO while we 
>>>>> still try to use a kernel mapping for it? That could have unforeseen 
>>>>> consequences.
>>>>
>>>>
>>>> Are you asking what happens to kmap or vmap style mapped CPU accesses once 
>>>> we drop all the DMA backing pages for a particular BO ? Because for user 
>>>> mappings
>>>> (mmap) we took care of this with dummy page reroute but indeed nothing was 
>>>> done for in kernel CPU mappings.
>>>
>>> Yes exactly that.
>>>
>>> In other words what happens if we free the ring buffer while the kernel 
>>> still writes to it?
>>>
>>> Christian.
>>
>>
>> While we can't control user application accesses to the mapped buffers 
>> explicitly and hence we use page fault rerouting
>> I am thinking that in this  case we may be able to sprinkle 
>> drm_dev_enter/exit in any such sensitive place were we might
>> CPU access a DMA buffer from the kernel ?
>
> Yes, I fear we are going to need that.
>
>> Things like CPU page table updates, ring buffer accesses and FW memcpy ? Is 
>> there other places ?
>
> Puh, good question. I have no idea.
>
>> Another point is that at this point the driver shouldn't access any such 
>> buffers as we are at the process finishing the device.
>> AFAIK there is no page fault mechanism for kernel mappings so I don't think 
>> there is anything else to do ?
>
> Well there is a page fault handler for kernel mappings, but that one just 
> prints the stack trace into the system log and calls BUG(); :)
>
> Long story short we need to avoid any access to released pages after unplug. 
> No matter if it's from the kernel or userspace.


I was just about to start guarding with drm_dev_enter/exit CPU accesses from 
kernel to GTT ot VRAM buffers but then i looked more in the code
and seems like ttm_tt_unpopulate just deletes DMA mappings (for the sake of 
device to main memory access). Kernel page table is not touched
until last bo refcount is dropped and the bo is released 
(ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This is both
for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped by 
ioremap. So as i see it, nothing will bad will happen after we
unpopulate a BO while we still try to use a kernel mapping for it, system memory 
pages backing GTT BOs are still mapped and not freed and for
VRAM BOs same is for the IO physical ranges mapped into the kernel page table 
since iounmap wasn't called yet. I loaded the driver with vm_update_mode=3
meaning all VM updates done using CPU and hasn't seen any OOPs after removing 
the device. I guess i can test it more by allocating GTT and VRAM BOs
and trying to read/write to them after device is removed.

Andrey


>
> Regards,
> Christian.
>
>>
>> Andrey
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-15 20:18                       ` Andrey Grodzovsky
@ 2020-12-16  8:04                         ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-12-16  8:04 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh

Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> [SNIP]
>>>
>>> While we can't control user application accesses to the mapped 
>>> buffers explicitly and hence we use page fault rerouting
>>> I am thinking that in this  case we may be able to sprinkle 
>>> drm_dev_enter/exit in any such sensitive place were we might
>>> CPU access a DMA buffer from the kernel ?
>>
>> Yes, I fear we are going to need that.
>>
>>> Things like CPU page table updates, ring buffer accesses and FW 
>>> memcpy ? Is there other places ?
>>
>> Puh, good question. I have no idea.
>>
>>> Another point is that at this point the driver shouldn't access any 
>>> such buffers as we are at the process finishing the device.
>>> AFAIK there is no page fault mechanism for kernel mappings so I 
>>> don't think there is anything else to do ?
>>
>> Well there is a page fault handler for kernel mappings, but that one 
>> just prints the stack trace into the system log and calls BUG(); :)
>>
>> Long story short we need to avoid any access to released pages after 
>> unplug. No matter if it's from the kernel or userspace.
>
>
> I was just about to start guarding with drm_dev_enter/exit CPU 
> accesses from kernel to GTT ot VRAM buffers but then i looked more in 
> the code
> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the 
> sake of device to main memory access). Kernel page table is not touched
> until last bo refcount is dropped and the bo is released 
> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This 
> is both
> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped 
> by ioremap. So as i see it, nothing will bad will happen after we
> unpopulate a BO while we still try to use a kernel mapping for it, 
> system memory pages backing GTT BOs are still mapped and not freed and 
> for
> VRAM BOs same is for the IO physical ranges mapped into the kernel 
> page table since iounmap wasn't called yet.

The problem is the system pages would be freed and if we kernel driver 
still happily write to them we are pretty much busted because we write 
to freed up memory.

Christian.

> I loaded the driver with vm_update_mode=3
> meaning all VM updates done using CPU and hasn't seen any OOPs after 
> removing the device. I guess i can test it more by allocating GTT and 
> VRAM BOs
> and trying to read/write to them after device is removed.
>
> Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Andrey
>>
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-16  8:04                         ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-12-16  8:04 UTC (permalink / raw)
  To: Andrey Grodzovsky, christian.koenig, amd-gfx, dri-devel,
	daniel.vetter, robh, l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland

Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> [SNIP]
>>>
>>> While we can't control user application accesses to the mapped 
>>> buffers explicitly and hence we use page fault rerouting
>>> I am thinking that in this  case we may be able to sprinkle 
>>> drm_dev_enter/exit in any such sensitive place were we might
>>> CPU access a DMA buffer from the kernel ?
>>
>> Yes, I fear we are going to need that.
>>
>>> Things like CPU page table updates, ring buffer accesses and FW 
>>> memcpy ? Is there other places ?
>>
>> Puh, good question. I have no idea.
>>
>>> Another point is that at this point the driver shouldn't access any 
>>> such buffers as we are at the process finishing the device.
>>> AFAIK there is no page fault mechanism for kernel mappings so I 
>>> don't think there is anything else to do ?
>>
>> Well there is a page fault handler for kernel mappings, but that one 
>> just prints the stack trace into the system log and calls BUG(); :)
>>
>> Long story short we need to avoid any access to released pages after 
>> unplug. No matter if it's from the kernel or userspace.
>
>
> I was just about to start guarding with drm_dev_enter/exit CPU 
> accesses from kernel to GTT ot VRAM buffers but then i looked more in 
> the code
> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the 
> sake of device to main memory access). Kernel page table is not touched
> until last bo refcount is dropped and the bo is released 
> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This 
> is both
> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped 
> by ioremap. So as i see it, nothing will bad will happen after we
> unpopulate a BO while we still try to use a kernel mapping for it, 
> system memory pages backing GTT BOs are still mapped and not freed and 
> for
> VRAM BOs same is for the IO physical ranges mapped into the kernel 
> page table since iounmap wasn't called yet.

The problem is the system pages would be freed and if we kernel driver 
still happily write to them we are pretty much busted because we write 
to freed up memory.

Christian.

> I loaded the driver with vm_update_mode=3
> meaning all VM updates done using CPU and hasn't seen any OOPs after 
> removing the device. I guess i can test it more by allocating GTT and 
> VRAM BOs
> and trying to read/write to them after device is removed.
>
> Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>>>
>>> Andrey
>>
>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-16  8:04                         ` Christian König
@ 2020-12-16 14:21                           ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-16 14:21 UTC (permalink / raw)
  To: Christian König
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher

On Wed, Dec 16, 2020 at 9:04 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> > [SNIP]
> >>>
> >>> While we can't control user application accesses to the mapped
> >>> buffers explicitly and hence we use page fault rerouting
> >>> I am thinking that in this  case we may be able to sprinkle
> >>> drm_dev_enter/exit in any such sensitive place were we might
> >>> CPU access a DMA buffer from the kernel ?
> >>
> >> Yes, I fear we are going to need that.
> >>
> >>> Things like CPU page table updates, ring buffer accesses and FW
> >>> memcpy ? Is there other places ?
> >>
> >> Puh, good question. I have no idea.
> >>
> >>> Another point is that at this point the driver shouldn't access any
> >>> such buffers as we are at the process finishing the device.
> >>> AFAIK there is no page fault mechanism for kernel mappings so I
> >>> don't think there is anything else to do ?
> >>
> >> Well there is a page fault handler for kernel mappings, but that one
> >> just prints the stack trace into the system log and calls BUG(); :)
> >>
> >> Long story short we need to avoid any access to released pages after
> >> unplug. No matter if it's from the kernel or userspace.
> >
> >
> > I was just about to start guarding with drm_dev_enter/exit CPU
> > accesses from kernel to GTT ot VRAM buffers but then i looked more in
> > the code
> > and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> > sake of device to main memory access). Kernel page table is not touched
> > until last bo refcount is dropped and the bo is released
> > (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> > is both
> > for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> > by ioremap. So as i see it, nothing will bad will happen after we
> > unpopulate a BO while we still try to use a kernel mapping for it,
> > system memory pages backing GTT BOs are still mapped and not freed and
> > for
> > VRAM BOs same is for the IO physical ranges mapped into the kernel
> > page table since iounmap wasn't called yet.
>
> The problem is the system pages would be freed and if we kernel driver
> still happily write to them we are pretty much busted because we write
> to freed up memory.

Similar for vram, if this is actual hotunplug and then replug, there's
going to be a different device behind the same mmio bar range most
likely (the higher bridges all this have the same windows assigned),
and that's bad news if we keep using it for current drivers. So we
really have to point all these cpu ptes to some other place.
-Daniel

>
> Christian.
>
> > I loaded the driver with vm_update_mode=3
> > meaning all VM updates done using CPU and hasn't seen any OOPs after
> > removing the device. I guess i can test it more by allocating GTT and
> > VRAM BOs
> > and trying to read/write to them after device is removed.
> >
> > Andrey
> >
> >
> >>
> >> Regards,
> >> Christian.
> >>
> >>>
> >>> Andrey
> >>
> >>
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-16 14:21                           ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-16 14:21 UTC (permalink / raw)
  To: Christian König
  Cc: Andrey Grodzovsky, amd-gfx list, Rob Herring, Greg KH, dri-devel,
	Anholt, Eric, Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland,
	Harry, Lucas Stach

On Wed, Dec 16, 2020 at 9:04 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> > [SNIP]
> >>>
> >>> While we can't control user application accesses to the mapped
> >>> buffers explicitly and hence we use page fault rerouting
> >>> I am thinking that in this  case we may be able to sprinkle
> >>> drm_dev_enter/exit in any such sensitive place were we might
> >>> CPU access a DMA buffer from the kernel ?
> >>
> >> Yes, I fear we are going to need that.
> >>
> >>> Things like CPU page table updates, ring buffer accesses and FW
> >>> memcpy ? Is there other places ?
> >>
> >> Puh, good question. I have no idea.
> >>
> >>> Another point is that at this point the driver shouldn't access any
> >>> such buffers as we are at the process finishing the device.
> >>> AFAIK there is no page fault mechanism for kernel mappings so I
> >>> don't think there is anything else to do ?
> >>
> >> Well there is a page fault handler for kernel mappings, but that one
> >> just prints the stack trace into the system log and calls BUG(); :)
> >>
> >> Long story short we need to avoid any access to released pages after
> >> unplug. No matter if it's from the kernel or userspace.
> >
> >
> > I was just about to start guarding with drm_dev_enter/exit CPU
> > accesses from kernel to GTT ot VRAM buffers but then i looked more in
> > the code
> > and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> > sake of device to main memory access). Kernel page table is not touched
> > until last bo refcount is dropped and the bo is released
> > (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> > is both
> > for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> > by ioremap. So as i see it, nothing will bad will happen after we
> > unpopulate a BO while we still try to use a kernel mapping for it,
> > system memory pages backing GTT BOs are still mapped and not freed and
> > for
> > VRAM BOs same is for the IO physical ranges mapped into the kernel
> > page table since iounmap wasn't called yet.
>
> The problem is the system pages would be freed and if we kernel driver
> still happily write to them we are pretty much busted because we write
> to freed up memory.

Similar for vram, if this is actual hotunplug and then replug, there's
going to be a different device behind the same mmio bar range most
likely (the higher bridges all this have the same windows assigned),
and that's bad news if we keep using it for current drivers. So we
really have to point all these cpu ptes to some other place.
-Daniel

>
> Christian.
>
> > I loaded the driver with vm_update_mode=3
> > meaning all VM updates done using CPU and hasn't seen any OOPs after
> > removing the device. I guess i can test it more by allocating GTT and
> > VRAM BOs
> > and trying to read/write to them after device is removed.
> >
> > Andrey
> >
> >
> >>
> >> Regards,
> >> Christian.
> >>
> >>>
> >>> Andrey
> >>
> >>
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-16 14:21                           ` Daniel Vetter
@ 2020-12-16 16:13                             ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-16 16:13 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher


On 12/16/20 9:21 AM, Daniel Vetter wrote:
> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>> [SNIP]
>>>>> While we can't control user application accesses to the mapped
>>>>> buffers explicitly and hence we use page fault rerouting
>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>> CPU access a DMA buffer from the kernel ?
>>>> Yes, I fear we are going to need that.
>>>>
>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>> memcpy ? Is there other places ?
>>>> Puh, good question. I have no idea.
>>>>
>>>>> Another point is that at this point the driver shouldn't access any
>>>>> such buffers as we are at the process finishing the device.
>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>> don't think there is anything else to do ?
>>>> Well there is a page fault handler for kernel mappings, but that one
>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>
>>>> Long story short we need to avoid any access to released pages after
>>>> unplug. No matter if it's from the kernel or userspace.
>>>
>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>> the code
>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>> sake of device to main memory access). Kernel page table is not touched
>>> until last bo refcount is dropped and the bo is released
>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>> is both
>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>> by ioremap. So as i see it, nothing will bad will happen after we
>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>> system memory pages backing GTT BOs are still mapped and not freed and
>>> for
>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>> page table since iounmap wasn't called yet.
>> The problem is the system pages would be freed and if we kernel driver
>> still happily write to them we are pretty much busted because we write
>> to freed up memory.


OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will release
the GTT BO pages. But then isn't there a problem in ttm_bo_release since
ttm_bo_cleanup_memtype_use which also leads to pages release comes
before bo->destroy which unmaps the pages from kernel page table ? Won't
we have end up writing to freed memory in this time interval ? Don't we
need to postpone pages freeing to after kernel page table unmapping ?


> Similar for vram, if this is actual hotunplug and then replug, there's
> going to be a different device behind the same mmio bar range most
> likely (the higher bridges all this have the same windows assigned),


No idea how this actually works but if we haven't called iounmap yet
doesn't it mean that those physical ranges that are still mapped into page
table should be reserved and cannot be reused for another
device ? As a guess, maybe another subrange from the higher bridge's total
range will be allocated.

> and that's bad news if we keep using it for current drivers. So we
> really have to point all these cpu ptes to some other place.


We can't just unmap it without syncing against any in kernel accesses to those 
buffers
and since page faulting technique we use for user mapped buffers seems to not be 
possible
for kernel mapped buffers I am not sure how to do it gracefully...

Andrey


> -Daniel
>
>> Christian.
>>
>>> I loaded the driver with vm_update_mode=3
>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>> removing the device. I guess i can test it more by allocating GTT and
>>> VRAM BOs
>>> and trying to read/write to them after device is removed.
>>>
>>> Andrey
>>>
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Andrey
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&amp;reserved=0
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-16 16:13                             ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-16 16:13 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Rob Herring, amd-gfx list, Greg KH, dri-devel, Anholt, Eric,
	Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland, Harry,
	Lucas Stach


On 12/16/20 9:21 AM, Daniel Vetter wrote:
> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>> [SNIP]
>>>>> While we can't control user application accesses to the mapped
>>>>> buffers explicitly and hence we use page fault rerouting
>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>> CPU access a DMA buffer from the kernel ?
>>>> Yes, I fear we are going to need that.
>>>>
>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>> memcpy ? Is there other places ?
>>>> Puh, good question. I have no idea.
>>>>
>>>>> Another point is that at this point the driver shouldn't access any
>>>>> such buffers as we are at the process finishing the device.
>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>> don't think there is anything else to do ?
>>>> Well there is a page fault handler for kernel mappings, but that one
>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>
>>>> Long story short we need to avoid any access to released pages after
>>>> unplug. No matter if it's from the kernel or userspace.
>>>
>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>> the code
>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>> sake of device to main memory access). Kernel page table is not touched
>>> until last bo refcount is dropped and the bo is released
>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>> is both
>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>> by ioremap. So as i see it, nothing will bad will happen after we
>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>> system memory pages backing GTT BOs are still mapped and not freed and
>>> for
>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>> page table since iounmap wasn't called yet.
>> The problem is the system pages would be freed and if we kernel driver
>> still happily write to them we are pretty much busted because we write
>> to freed up memory.


OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will release
the GTT BO pages. But then isn't there a problem in ttm_bo_release since
ttm_bo_cleanup_memtype_use which also leads to pages release comes
before bo->destroy which unmaps the pages from kernel page table ? Won't
we have end up writing to freed memory in this time interval ? Don't we
need to postpone pages freeing to after kernel page table unmapping ?


> Similar for vram, if this is actual hotunplug and then replug, there's
> going to be a different device behind the same mmio bar range most
> likely (the higher bridges all this have the same windows assigned),


No idea how this actually works but if we haven't called iounmap yet
doesn't it mean that those physical ranges that are still mapped into page
table should be reserved and cannot be reused for another
device ? As a guess, maybe another subrange from the higher bridge's total
range will be allocated.

> and that's bad news if we keep using it for current drivers. So we
> really have to point all these cpu ptes to some other place.


We can't just unmap it without syncing against any in kernel accesses to those 
buffers
and since page faulting technique we use for user mapped buffers seems to not be 
possible
for kernel mapped buffers I am not sure how to do it gracefully...

Andrey


> -Daniel
>
>> Christian.
>>
>>> I loaded the driver with vm_update_mode=3
>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>> removing the device. I guess i can test it more by allocating GTT and
>>> VRAM BOs
>>> and trying to read/write to them after device is removed.
>>>
>>> Andrey
>>>
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Andrey
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&amp;reserved=0
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-16 16:13                             ` Andrey Grodzovsky
@ 2020-12-16 16:18                               ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-12-16 16:18 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher

Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>
> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>> [SNIP]
>>>>>> While we can't control user application accesses to the mapped
>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>> CPU access a DMA buffer from the kernel ?
>>>>> Yes, I fear we are going to need that.
>>>>>
>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>> memcpy ? Is there other places ?
>>>>> Puh, good question. I have no idea.
>>>>>
>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>> such buffers as we are at the process finishing the device.
>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>> don't think there is anything else to do ?
>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>
>>>>> Long story short we need to avoid any access to released pages after
>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>
>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>> the code
>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>> sake of device to main memory access). Kernel page table is not 
>>>> touched
>>>> until last bo refcount is dropped and the bo is released
>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>> is both
>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>> for
>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>> page table since iounmap wasn't called yet.
>>> The problem is the system pages would be freed and if we kernel driver
>>> still happily write to them we are pretty much busted because we write
>>> to freed up memory.
>
>
> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will 
> release
> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> ttm_bo_cleanup_memtype_use which also leads to pages release comes
> before bo->destroy which unmaps the pages from kernel page table ? Won't
> we have end up writing to freed memory in this time interval ? Don't we
> need to postpone pages freeing to after kernel page table unmapping ?

BOs are only destroyed when there is a guarantee that nobody is 
accessing them any more.

The problem here is that the pages as well as the VRAM can be 
immediately reused after the hotplug event.

>
>
>> Similar for vram, if this is actual hotunplug and then replug, there's
>> going to be a different device behind the same mmio bar range most
>> likely (the higher bridges all this have the same windows assigned),
>
>
> No idea how this actually works but if we haven't called iounmap yet
> doesn't it mean that those physical ranges that are still mapped into 
> page
> table should be reserved and cannot be reused for another
> device ? As a guess, maybe another subrange from the higher bridge's 
> total
> range will be allocated.

Nope, the PCIe subsystem doesn't care about any ioremap still active for 
a range when it is hotplugged.

>
>> and that's bad news if we keep using it for current drivers. So we
>> really have to point all these cpu ptes to some other place.
>
>
> We can't just unmap it without syncing against any in kernel accesses 
> to those buffers
> and since page faulting technique we use for user mapped buffers seems 
> to not be possible
> for kernel mapped buffers I am not sure how to do it gracefully...

We could try to replace the kmap with a dummy page under the hood, but 
that is extremely tricky.

Especially since BOs which are just 1 page in size could point to the 
linear mapping directly.

Christian.

>
> Andrey
>
>
>> -Daniel
>>
>>> Christian.
>>>
>>>> I loaded the driver with vm_update_mode=3
>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>> removing the device. I guess i can test it more by allocating GTT and
>>>> VRAM BOs
>>>> and trying to read/write to them after device is removed.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&amp;reserved=0 
>>>>
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-16 16:18                               ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-12-16 16:18 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: Rob Herring, amd-gfx list, Greg KH, dri-devel, Anholt, Eric,
	Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland, Harry,
	Lucas Stach

Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>
> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>> [SNIP]
>>>>>> While we can't control user application accesses to the mapped
>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>> CPU access a DMA buffer from the kernel ?
>>>>> Yes, I fear we are going to need that.
>>>>>
>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>> memcpy ? Is there other places ?
>>>>> Puh, good question. I have no idea.
>>>>>
>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>> such buffers as we are at the process finishing the device.
>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>> don't think there is anything else to do ?
>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>
>>>>> Long story short we need to avoid any access to released pages after
>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>
>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>> the code
>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>> sake of device to main memory access). Kernel page table is not 
>>>> touched
>>>> until last bo refcount is dropped and the bo is released
>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>> is both
>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>> for
>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>> page table since iounmap wasn't called yet.
>>> The problem is the system pages would be freed and if we kernel driver
>>> still happily write to them we are pretty much busted because we write
>>> to freed up memory.
>
>
> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will 
> release
> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> ttm_bo_cleanup_memtype_use which also leads to pages release comes
> before bo->destroy which unmaps the pages from kernel page table ? Won't
> we have end up writing to freed memory in this time interval ? Don't we
> need to postpone pages freeing to after kernel page table unmapping ?

BOs are only destroyed when there is a guarantee that nobody is 
accessing them any more.

The problem here is that the pages as well as the VRAM can be 
immediately reused after the hotplug event.

>
>
>> Similar for vram, if this is actual hotunplug and then replug, there's
>> going to be a different device behind the same mmio bar range most
>> likely (the higher bridges all this have the same windows assigned),
>
>
> No idea how this actually works but if we haven't called iounmap yet
> doesn't it mean that those physical ranges that are still mapped into 
> page
> table should be reserved and cannot be reused for another
> device ? As a guess, maybe another subrange from the higher bridge's 
> total
> range will be allocated.

Nope, the PCIe subsystem doesn't care about any ioremap still active for 
a range when it is hotplugged.

>
>> and that's bad news if we keep using it for current drivers. So we
>> really have to point all these cpu ptes to some other place.
>
>
> We can't just unmap it without syncing against any in kernel accesses 
> to those buffers
> and since page faulting technique we use for user mapped buffers seems 
> to not be possible
> for kernel mapped buffers I am not sure how to do it gracefully...

We could try to replace the kmap with a dummy page under the hood, but 
that is extremely tricky.

Especially since BOs which are just 1 page in size could point to the 
linear mapping directly.

Christian.

>
> Andrey
>
>
>> -Daniel
>>
>>> Christian.
>>>
>>>> I loaded the driver with vm_update_mode=3
>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>> removing the device. I guess i can test it more by allocating GTT and
>>>> VRAM BOs
>>>> and trying to read/write to them after device is removed.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&amp;reserved=0 
>>>>
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-16 16:18                               ` Christian König
@ 2020-12-16 17:12                                 ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-16 17:12 UTC (permalink / raw)
  To: Christian König
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher

On Wed, Dec 16, 2020 at 5:18 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> >
> > On 12/16/20 9:21 AM, Daniel Vetter wrote:
> >> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> >> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> >>>> [SNIP]
> >>>>>> While we can't control user application accesses to the mapped
> >>>>>> buffers explicitly and hence we use page fault rerouting
> >>>>>> I am thinking that in this  case we may be able to sprinkle
> >>>>>> drm_dev_enter/exit in any such sensitive place were we might
> >>>>>> CPU access a DMA buffer from the kernel ?
> >>>>> Yes, I fear we are going to need that.
> >>>>>
> >>>>>> Things like CPU page table updates, ring buffer accesses and FW
> >>>>>> memcpy ? Is there other places ?
> >>>>> Puh, good question. I have no idea.
> >>>>>
> >>>>>> Another point is that at this point the driver shouldn't access any
> >>>>>> such buffers as we are at the process finishing the device.
> >>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
> >>>>>> don't think there is anything else to do ?
> >>>>> Well there is a page fault handler for kernel mappings, but that one
> >>>>> just prints the stack trace into the system log and calls BUG(); :)
> >>>>>
> >>>>> Long story short we need to avoid any access to released pages after
> >>>>> unplug. No matter if it's from the kernel or userspace.
> >>>>
> >>>> I was just about to start guarding with drm_dev_enter/exit CPU
> >>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
> >>>> the code
> >>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> >>>> sake of device to main memory access). Kernel page table is not
> >>>> touched
> >>>> until last bo refcount is dropped and the bo is released
> >>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> >>>> is both
> >>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> >>>> by ioremap. So as i see it, nothing will bad will happen after we
> >>>> unpopulate a BO while we still try to use a kernel mapping for it,
> >>>> system memory pages backing GTT BOs are still mapped and not freed and
> >>>> for
> >>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
> >>>> page table since iounmap wasn't called yet.
> >>> The problem is the system pages would be freed and if we kernel driver
> >>> still happily write to them we are pretty much busted because we write
> >>> to freed up memory.
> >
> >
> > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> > release
> > the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> > ttm_bo_cleanup_memtype_use which also leads to pages release comes
> > before bo->destroy which unmaps the pages from kernel page table ? Won't
> > we have end up writing to freed memory in this time interval ? Don't we
> > need to postpone pages freeing to after kernel page table unmapping ?
>
> BOs are only destroyed when there is a guarantee that nobody is
> accessing them any more.
>
> The problem here is that the pages as well as the VRAM can be
> immediately reused after the hotplug event.
>
> >
> >
> >> Similar for vram, if this is actual hotunplug and then replug, there's
> >> going to be a different device behind the same mmio bar range most
> >> likely (the higher bridges all this have the same windows assigned),
> >
> >
> > No idea how this actually works but if we haven't called iounmap yet
> > doesn't it mean that those physical ranges that are still mapped into
> > page
> > table should be reserved and cannot be reused for another
> > device ? As a guess, maybe another subrange from the higher bridge's
> > total
> > range will be allocated.
>
> Nope, the PCIe subsystem doesn't care about any ioremap still active for
> a range when it is hotplugged.
>
> >
> >> and that's bad news if we keep using it for current drivers. So we
> >> really have to point all these cpu ptes to some other place.
> >
> >
> > We can't just unmap it without syncing against any in kernel accesses
> > to those buffers
> > and since page faulting technique we use for user mapped buffers seems
> > to not be possible
> > for kernel mapped buffers I am not sure how to do it gracefully...
>
> We could try to replace the kmap with a dummy page under the hood, but
> that is extremely tricky.
>
> Especially since BOs which are just 1 page in size could point to the
> linear mapping directly.

I think it's just more work. Essentially
- convert as much as possible of the kernel mappings to vmap_local,
which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
serve as a barrier, and ofc any new vmap needs to fail or hand out a
dummy mapping.
- handle fbcon somehow. I think shutting it all down should work out.
- worst case keep the system backing storage around for shared dma-buf
until the other non-dynamic driver releases it. for vram we require
dynamic importers (and maybe it wasn't such a bright idea to allow
pinning of importer buffers, might need to revisit that).

Cheers, Daniel

>
> Christian.
>
> >
> > Andrey
> >
> >
> >> -Daniel
> >>
> >>> Christian.
> >>>
> >>>> I loaded the driver with vm_update_mode=3
> >>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> >>>> removing the device. I guess i can test it more by allocating GTT and
> >>>> VRAM BOs
> >>>> and trying to read/write to them after device is removed.
> >>>>
> >>>> Andrey
> >>>>
> >>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>>> Andrey
> >>>>>
> >>>> _______________________________________________
> >>>> amd-gfx mailing list
> >>>> amd-gfx@lists.freedesktop.org
> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&amp;reserved=0
> >>>>
> >>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-16 17:12                                 ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-16 17:12 UTC (permalink / raw)
  To: Christian König
  Cc: Andrey Grodzovsky, amd-gfx list, Rob Herring, Greg KH, dri-devel,
	Anholt, Eric, Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland,
	Harry, Lucas Stach

On Wed, Dec 16, 2020 at 5:18 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> >
> > On 12/16/20 9:21 AM, Daniel Vetter wrote:
> >> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> >> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> >>>> [SNIP]
> >>>>>> While we can't control user application accesses to the mapped
> >>>>>> buffers explicitly and hence we use page fault rerouting
> >>>>>> I am thinking that in this  case we may be able to sprinkle
> >>>>>> drm_dev_enter/exit in any such sensitive place were we might
> >>>>>> CPU access a DMA buffer from the kernel ?
> >>>>> Yes, I fear we are going to need that.
> >>>>>
> >>>>>> Things like CPU page table updates, ring buffer accesses and FW
> >>>>>> memcpy ? Is there other places ?
> >>>>> Puh, good question. I have no idea.
> >>>>>
> >>>>>> Another point is that at this point the driver shouldn't access any
> >>>>>> such buffers as we are at the process finishing the device.
> >>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
> >>>>>> don't think there is anything else to do ?
> >>>>> Well there is a page fault handler for kernel mappings, but that one
> >>>>> just prints the stack trace into the system log and calls BUG(); :)
> >>>>>
> >>>>> Long story short we need to avoid any access to released pages after
> >>>>> unplug. No matter if it's from the kernel or userspace.
> >>>>
> >>>> I was just about to start guarding with drm_dev_enter/exit CPU
> >>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
> >>>> the code
> >>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> >>>> sake of device to main memory access). Kernel page table is not
> >>>> touched
> >>>> until last bo refcount is dropped and the bo is released
> >>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> >>>> is both
> >>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> >>>> by ioremap. So as i see it, nothing will bad will happen after we
> >>>> unpopulate a BO while we still try to use a kernel mapping for it,
> >>>> system memory pages backing GTT BOs are still mapped and not freed and
> >>>> for
> >>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
> >>>> page table since iounmap wasn't called yet.
> >>> The problem is the system pages would be freed and if we kernel driver
> >>> still happily write to them we are pretty much busted because we write
> >>> to freed up memory.
> >
> >
> > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> > release
> > the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> > ttm_bo_cleanup_memtype_use which also leads to pages release comes
> > before bo->destroy which unmaps the pages from kernel page table ? Won't
> > we have end up writing to freed memory in this time interval ? Don't we
> > need to postpone pages freeing to after kernel page table unmapping ?
>
> BOs are only destroyed when there is a guarantee that nobody is
> accessing them any more.
>
> The problem here is that the pages as well as the VRAM can be
> immediately reused after the hotplug event.
>
> >
> >
> >> Similar for vram, if this is actual hotunplug and then replug, there's
> >> going to be a different device behind the same mmio bar range most
> >> likely (the higher bridges all this have the same windows assigned),
> >
> >
> > No idea how this actually works but if we haven't called iounmap yet
> > doesn't it mean that those physical ranges that are still mapped into
> > page
> > table should be reserved and cannot be reused for another
> > device ? As a guess, maybe another subrange from the higher bridge's
> > total
> > range will be allocated.
>
> Nope, the PCIe subsystem doesn't care about any ioremap still active for
> a range when it is hotplugged.
>
> >
> >> and that's bad news if we keep using it for current drivers. So we
> >> really have to point all these cpu ptes to some other place.
> >
> >
> > We can't just unmap it without syncing against any in kernel accesses
> > to those buffers
> > and since page faulting technique we use for user mapped buffers seems
> > to not be possible
> > for kernel mapped buffers I am not sure how to do it gracefully...
>
> We could try to replace the kmap with a dummy page under the hood, but
> that is extremely tricky.
>
> Especially since BOs which are just 1 page in size could point to the
> linear mapping directly.

I think it's just more work. Essentially
- convert as much as possible of the kernel mappings to vmap_local,
which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
serve as a barrier, and ofc any new vmap needs to fail or hand out a
dummy mapping.
- handle fbcon somehow. I think shutting it all down should work out.
- worst case keep the system backing storage around for shared dma-buf
until the other non-dynamic driver releases it. for vram we require
dynamic importers (and maybe it wasn't such a bright idea to allow
pinning of importer buffers, might need to revisit that).

Cheers, Daniel

>
> Christian.
>
> >
> > Andrey
> >
> >
> >> -Daniel
> >>
> >>> Christian.
> >>>
> >>>> I loaded the driver with vm_update_mode=3
> >>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> >>>> removing the device. I guess i can test it more by allocating GTT and
> >>>> VRAM BOs
> >>>> and trying to read/write to them after device is removed.
> >>>>
> >>>> Andrey
> >>>>
> >>>>
> >>>>> Regards,
> >>>>> Christian.
> >>>>>
> >>>>>> Andrey
> >>>>>
> >>>> _______________________________________________
> >>>> amd-gfx mailing list
> >>>> amd-gfx@lists.freedesktop.org
> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&amp;reserved=0
> >>>>
> >>
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-16 17:12                                 ` Daniel Vetter
@ 2020-12-16 17:20                                   ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-16 17:20 UTC (permalink / raw)
  To: Christian König
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher

On Wed, Dec 16, 2020 at 6:12 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Wed, Dec 16, 2020 at 5:18 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> > >
> > > On 12/16/20 9:21 AM, Daniel Vetter wrote:
> > >> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> > >> <ckoenig.leichtzumerken@gmail.com> wrote:
> > >>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> > >>>> [SNIP]
> > >>>>>> While we can't control user application accesses to the mapped
> > >>>>>> buffers explicitly and hence we use page fault rerouting
> > >>>>>> I am thinking that in this  case we may be able to sprinkle
> > >>>>>> drm_dev_enter/exit in any such sensitive place were we might
> > >>>>>> CPU access a DMA buffer from the kernel ?
> > >>>>> Yes, I fear we are going to need that.
> > >>>>>
> > >>>>>> Things like CPU page table updates, ring buffer accesses and FW
> > >>>>>> memcpy ? Is there other places ?
> > >>>>> Puh, good question. I have no idea.
> > >>>>>
> > >>>>>> Another point is that at this point the driver shouldn't access any
> > >>>>>> such buffers as we are at the process finishing the device.
> > >>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
> > >>>>>> don't think there is anything else to do ?
> > >>>>> Well there is a page fault handler for kernel mappings, but that one
> > >>>>> just prints the stack trace into the system log and calls BUG(); :)
> > >>>>>
> > >>>>> Long story short we need to avoid any access to released pages after
> > >>>>> unplug. No matter if it's from the kernel or userspace.
> > >>>>
> > >>>> I was just about to start guarding with drm_dev_enter/exit CPU
> > >>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
> > >>>> the code
> > >>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> > >>>> sake of device to main memory access). Kernel page table is not
> > >>>> touched
> > >>>> until last bo refcount is dropped and the bo is released
> > >>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> > >>>> is both
> > >>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> > >>>> by ioremap. So as i see it, nothing will bad will happen after we
> > >>>> unpopulate a BO while we still try to use a kernel mapping for it,
> > >>>> system memory pages backing GTT BOs are still mapped and not freed and
> > >>>> for
> > >>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
> > >>>> page table since iounmap wasn't called yet.
> > >>> The problem is the system pages would be freed and if we kernel driver
> > >>> still happily write to them we are pretty much busted because we write
> > >>> to freed up memory.
> > >
> > >
> > > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> > > release
> > > the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> > > ttm_bo_cleanup_memtype_use which also leads to pages release comes
> > > before bo->destroy which unmaps the pages from kernel page table ? Won't
> > > we have end up writing to freed memory in this time interval ? Don't we
> > > need to postpone pages freeing to after kernel page table unmapping ?
> >
> > BOs are only destroyed when there is a guarantee that nobody is
> > accessing them any more.
> >
> > The problem here is that the pages as well as the VRAM can be
> > immediately reused after the hotplug event.
> >
> > >
> > >
> > >> Similar for vram, if this is actual hotunplug and then replug, there's
> > >> going to be a different device behind the same mmio bar range most
> > >> likely (the higher bridges all this have the same windows assigned),
> > >
> > >
> > > No idea how this actually works but if we haven't called iounmap yet
> > > doesn't it mean that those physical ranges that are still mapped into
> > > page
> > > table should be reserved and cannot be reused for another
> > > device ? As a guess, maybe another subrange from the higher bridge's
> > > total
> > > range will be allocated.
> >
> > Nope, the PCIe subsystem doesn't care about any ioremap still active for
> > a range when it is hotplugged.
> >
> > >
> > >> and that's bad news if we keep using it for current drivers. So we
> > >> really have to point all these cpu ptes to some other place.
> > >
> > >
> > > We can't just unmap it without syncing against any in kernel accesses
> > > to those buffers
> > > and since page faulting technique we use for user mapped buffers seems
> > > to not be possible
> > > for kernel mapped buffers I am not sure how to do it gracefully...
> >
> > We could try to replace the kmap with a dummy page under the hood, but
> > that is extremely tricky.
> >
> > Especially since BOs which are just 1 page in size could point to the
> > linear mapping directly.
>
> I think it's just more work. Essentially
> - convert as much as possible of the kernel mappings to vmap_local,
> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> serve as a barrier, and ofc any new vmap needs to fail or hand out a
> dummy mapping.
> - handle fbcon somehow. I think shutting it all down should work out.

Oh also for fbdev I think best to switch over to
drm_fbdev_generic_setup(). That should handle all the lifetime fun
correctly already, minus the vram complication. So at least fewer
oopses for other reasons :-)
-Daniel

> - worst case keep the system backing storage around for shared dma-buf
> until the other non-dynamic driver releases it. for vram we require
> dynamic importers (and maybe it wasn't such a bright idea to allow
> pinning of importer buffers, might need to revisit that).
>
> Cheers, Daniel
>
> >
> > Christian.
> >
> > >
> > > Andrey
> > >
> > >
> > >> -Daniel
> > >>
> > >>> Christian.
> > >>>
> > >>>> I loaded the driver with vm_update_mode=3
> > >>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> > >>>> removing the device. I guess i can test it more by allocating GTT and
> > >>>> VRAM BOs
> > >>>> and trying to read/write to them after device is removed.
> > >>>>
> > >>>> Andrey
> > >>>>
> > >>>>
> > >>>>> Regards,
> > >>>>> Christian.
> > >>>>>
> > >>>>>> Andrey
> > >>>>>
> > >>>> _______________________________________________
> > >>>> amd-gfx mailing list
> > >>>> amd-gfx@lists.freedesktop.org
> > >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&amp;reserved=0
> > >>>>
> > >>
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-16 17:20                                   ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-16 17:20 UTC (permalink / raw)
  To: Christian König
  Cc: Andrey Grodzovsky, amd-gfx list, Rob Herring, Greg KH, dri-devel,
	Anholt, Eric, Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland,
	Harry, Lucas Stach

On Wed, Dec 16, 2020 at 6:12 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Wed, Dec 16, 2020 at 5:18 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> > >
> > > On 12/16/20 9:21 AM, Daniel Vetter wrote:
> > >> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> > >> <ckoenig.leichtzumerken@gmail.com> wrote:
> > >>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> > >>>> [SNIP]
> > >>>>>> While we can't control user application accesses to the mapped
> > >>>>>> buffers explicitly and hence we use page fault rerouting
> > >>>>>> I am thinking that in this  case we may be able to sprinkle
> > >>>>>> drm_dev_enter/exit in any such sensitive place were we might
> > >>>>>> CPU access a DMA buffer from the kernel ?
> > >>>>> Yes, I fear we are going to need that.
> > >>>>>
> > >>>>>> Things like CPU page table updates, ring buffer accesses and FW
> > >>>>>> memcpy ? Is there other places ?
> > >>>>> Puh, good question. I have no idea.
> > >>>>>
> > >>>>>> Another point is that at this point the driver shouldn't access any
> > >>>>>> such buffers as we are at the process finishing the device.
> > >>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
> > >>>>>> don't think there is anything else to do ?
> > >>>>> Well there is a page fault handler for kernel mappings, but that one
> > >>>>> just prints the stack trace into the system log and calls BUG(); :)
> > >>>>>
> > >>>>> Long story short we need to avoid any access to released pages after
> > >>>>> unplug. No matter if it's from the kernel or userspace.
> > >>>>
> > >>>> I was just about to start guarding with drm_dev_enter/exit CPU
> > >>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
> > >>>> the code
> > >>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> > >>>> sake of device to main memory access). Kernel page table is not
> > >>>> touched
> > >>>> until last bo refcount is dropped and the bo is released
> > >>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> > >>>> is both
> > >>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> > >>>> by ioremap. So as i see it, nothing will bad will happen after we
> > >>>> unpopulate a BO while we still try to use a kernel mapping for it,
> > >>>> system memory pages backing GTT BOs are still mapped and not freed and
> > >>>> for
> > >>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
> > >>>> page table since iounmap wasn't called yet.
> > >>> The problem is the system pages would be freed and if we kernel driver
> > >>> still happily write to them we are pretty much busted because we write
> > >>> to freed up memory.
> > >
> > >
> > > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> > > release
> > > the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> > > ttm_bo_cleanup_memtype_use which also leads to pages release comes
> > > before bo->destroy which unmaps the pages from kernel page table ? Won't
> > > we have end up writing to freed memory in this time interval ? Don't we
> > > need to postpone pages freeing to after kernel page table unmapping ?
> >
> > BOs are only destroyed when there is a guarantee that nobody is
> > accessing them any more.
> >
> > The problem here is that the pages as well as the VRAM can be
> > immediately reused after the hotplug event.
> >
> > >
> > >
> > >> Similar for vram, if this is actual hotunplug and then replug, there's
> > >> going to be a different device behind the same mmio bar range most
> > >> likely (the higher bridges all this have the same windows assigned),
> > >
> > >
> > > No idea how this actually works but if we haven't called iounmap yet
> > > doesn't it mean that those physical ranges that are still mapped into
> > > page
> > > table should be reserved and cannot be reused for another
> > > device ? As a guess, maybe another subrange from the higher bridge's
> > > total
> > > range will be allocated.
> >
> > Nope, the PCIe subsystem doesn't care about any ioremap still active for
> > a range when it is hotplugged.
> >
> > >
> > >> and that's bad news if we keep using it for current drivers. So we
> > >> really have to point all these cpu ptes to some other place.
> > >
> > >
> > > We can't just unmap it without syncing against any in kernel accesses
> > > to those buffers
> > > and since page faulting technique we use for user mapped buffers seems
> > > to not be possible
> > > for kernel mapped buffers I am not sure how to do it gracefully...
> >
> > We could try to replace the kmap with a dummy page under the hood, but
> > that is extremely tricky.
> >
> > Especially since BOs which are just 1 page in size could point to the
> > linear mapping directly.
>
> I think it's just more work. Essentially
> - convert as much as possible of the kernel mappings to vmap_local,
> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> serve as a barrier, and ofc any new vmap needs to fail or hand out a
> dummy mapping.
> - handle fbcon somehow. I think shutting it all down should work out.

Oh also for fbdev I think best to switch over to
drm_fbdev_generic_setup(). That should handle all the lifetime fun
correctly already, minus the vram complication. So at least fewer
oopses for other reasons :-)
-Daniel

> - worst case keep the system backing storage around for shared dma-buf
> until the other non-dynamic driver releases it. for vram we require
> dynamic importers (and maybe it wasn't such a bright idea to allow
> pinning of importer buffers, might need to revisit that).
>
> Cheers, Daniel
>
> >
> > Christian.
> >
> > >
> > > Andrey
> > >
> > >
> > >> -Daniel
> > >>
> > >>> Christian.
> > >>>
> > >>>> I loaded the driver with vm_update_mode=3
> > >>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> > >>>> removing the device. I guess i can test it more by allocating GTT and
> > >>>> VRAM BOs
> > >>>> and trying to read/write to them after device is removed.
> > >>>>
> > >>>> Andrey
> > >>>>
> > >>>>
> > >>>>> Regards,
> > >>>>> Christian.
> > >>>>>
> > >>>>>> Andrey
> > >>>>>
> > >>>> _______________________________________________
> > >>>> amd-gfx mailing list
> > >>>> amd-gfx@lists.freedesktop.org
> > >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C6ee2a428d88a4742f45a08d8a1cde9c7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437253067654506%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WRL2smY7iemgZdlH3taUZCoa8h%2BuaKD1Hv0tbHUclAQ%3D&amp;reserved=0
> > >>>>
> > >>
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-16 17:12                                 ` Daniel Vetter
@ 2020-12-16 18:26                                   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-16 18:26 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher


On 12/16/20 12:12 PM, Daniel Vetter wrote:
> On Wed, Dec 16, 2020 at 5:18 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>>>> [SNIP]
>>>>>>>> While we can't control user application accesses to the mapped
>>>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>>> Yes, I fear we are going to need that.
>>>>>>>
>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>>>> memcpy ? Is there other places ?
>>>>>>> Puh, good question. I have no idea.
>>>>>>>
>>>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>>>> such buffers as we are at the process finishing the device.
>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>>>> don't think there is anything else to do ?
>>>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>>>
>>>>>>> Long story short we need to avoid any access to released pages after
>>>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>>>> the code
>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>>>> sake of device to main memory access). Kernel page table is not
>>>>>> touched
>>>>>> until last bo refcount is dropped and the bo is released
>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>>>> is both
>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>>>> for
>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>>>> page table since iounmap wasn't called yet.
>>>>> The problem is the system pages would be freed and if we kernel driver
>>>>> still happily write to them we are pretty much busted because we write
>>>>> to freed up memory.
>>>
>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
>>> release
>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
>>> we have end up writing to freed memory in this time interval ? Don't we
>>> need to postpone pages freeing to after kernel page table unmapping ?
>> BOs are only destroyed when there is a guarantee that nobody is
>> accessing them any more.
>>
>> The problem here is that the pages as well as the VRAM can be
>> immediately reused after the hotplug event.
>>
>>>
>>>> Similar for vram, if this is actual hotunplug and then replug, there's
>>>> going to be a different device behind the same mmio bar range most
>>>> likely (the higher bridges all this have the same windows assigned),
>>>
>>> No idea how this actually works but if we haven't called iounmap yet
>>> doesn't it mean that those physical ranges that are still mapped into
>>> page
>>> table should be reserved and cannot be reused for another
>>> device ? As a guess, maybe another subrange from the higher bridge's
>>> total
>>> range will be allocated.
>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
>> a range when it is hotplugged.
>>
>>>> and that's bad news if we keep using it for current drivers. So we
>>>> really have to point all these cpu ptes to some other place.
>>>
>>> We can't just unmap it without syncing against any in kernel accesses
>>> to those buffers
>>> and since page faulting technique we use for user mapped buffers seems
>>> to not be possible
>>> for kernel mapped buffers I am not sure how to do it gracefully...
>> We could try to replace the kmap with a dummy page under the hood, but
>> that is extremely tricky.
>>
>> Especially since BOs which are just 1 page in size could point to the
>> linear mapping directly.
> I think it's just more work. Essentially
> - convert as much as possible of the kernel mappings to vmap_local,
> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> serve as a barrier, and ofc any new vmap needs to fail or hand out a
> dummy mapping.

Read those patches. I am not sure how this helps with protecting
against accesses to released backing pages or IO physical ranges of BO
which is already mapped during the unplug event ?

Andrey


> - handle fbcon somehow. I think shutting it all down should work out.
> - worst case keep the system backing storage around for shared dma-buf
> until the other non-dynamic driver releases it. for vram we require
> dynamic importers (and maybe it wasn't such a bright idea to allow
> pinning of importer buffers, might need to revisit that).
>
> Cheers, Daniel
>
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>> -Daniel
>>>>
>>>>> Christian.
>>>>>
>>>>>> I loaded the driver with vm_update_mode=3
>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>> VRAM BOs
>>>>>> and trying to read/write to them after device is removed.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Andrey
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C37b4367cbdaa4133b01d08d8a1e5bf41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437355430007196%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=r0bIJS3HUDkFPqyFinAt4eahM%2BjF01DObZ5abgstzSU%3D&amp;reserved=0
>>>>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-16 18:26                                   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-16 18:26 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Rob Herring, amd-gfx list, Greg KH, dri-devel, Anholt, Eric,
	Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland, Harry,
	Lucas Stach


On 12/16/20 12:12 PM, Daniel Vetter wrote:
> On Wed, Dec 16, 2020 at 5:18 PM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>>>> [SNIP]
>>>>>>>> While we can't control user application accesses to the mapped
>>>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>>> Yes, I fear we are going to need that.
>>>>>>>
>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>>>> memcpy ? Is there other places ?
>>>>>>> Puh, good question. I have no idea.
>>>>>>>
>>>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>>>> such buffers as we are at the process finishing the device.
>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>>>> don't think there is anything else to do ?
>>>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>>>
>>>>>>> Long story short we need to avoid any access to released pages after
>>>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>>>> the code
>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>>>> sake of device to main memory access). Kernel page table is not
>>>>>> touched
>>>>>> until last bo refcount is dropped and the bo is released
>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>>>> is both
>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>>>> for
>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>>>> page table since iounmap wasn't called yet.
>>>>> The problem is the system pages would be freed and if we kernel driver
>>>>> still happily write to them we are pretty much busted because we write
>>>>> to freed up memory.
>>>
>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
>>> release
>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
>>> we have end up writing to freed memory in this time interval ? Don't we
>>> need to postpone pages freeing to after kernel page table unmapping ?
>> BOs are only destroyed when there is a guarantee that nobody is
>> accessing them any more.
>>
>> The problem here is that the pages as well as the VRAM can be
>> immediately reused after the hotplug event.
>>
>>>
>>>> Similar for vram, if this is actual hotunplug and then replug, there's
>>>> going to be a different device behind the same mmio bar range most
>>>> likely (the higher bridges all this have the same windows assigned),
>>>
>>> No idea how this actually works but if we haven't called iounmap yet
>>> doesn't it mean that those physical ranges that are still mapped into
>>> page
>>> table should be reserved and cannot be reused for another
>>> device ? As a guess, maybe another subrange from the higher bridge's
>>> total
>>> range will be allocated.
>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
>> a range when it is hotplugged.
>>
>>>> and that's bad news if we keep using it for current drivers. So we
>>>> really have to point all these cpu ptes to some other place.
>>>
>>> We can't just unmap it without syncing against any in kernel accesses
>>> to those buffers
>>> and since page faulting technique we use for user mapped buffers seems
>>> to not be possible
>>> for kernel mapped buffers I am not sure how to do it gracefully...
>> We could try to replace the kmap with a dummy page under the hood, but
>> that is extremely tricky.
>>
>> Especially since BOs which are just 1 page in size could point to the
>> linear mapping directly.
> I think it's just more work. Essentially
> - convert as much as possible of the kernel mappings to vmap_local,
> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> serve as a barrier, and ofc any new vmap needs to fail or hand out a
> dummy mapping.

Read those patches. I am not sure how this helps with protecting
against accesses to released backing pages or IO physical ranges of BO
which is already mapped during the unplug event ?

Andrey


> - handle fbcon somehow. I think shutting it all down should work out.
> - worst case keep the system backing storage around for shared dma-buf
> until the other non-dynamic driver releases it. for vram we require
> dynamic importers (and maybe it wasn't such a bright idea to allow
> pinning of importer buffers, might need to revisit that).
>
> Cheers, Daniel
>
>> Christian.
>>
>>> Andrey
>>>
>>>
>>>> -Daniel
>>>>
>>>>> Christian.
>>>>>
>>>>>> I loaded the driver with vm_update_mode=3
>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>> VRAM BOs
>>>>>> and trying to read/write to them after device is removed.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Andrey
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C37b4367cbdaa4133b01d08d8a1e5bf41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437355430007196%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=r0bIJS3HUDkFPqyFinAt4eahM%2BjF01DObZ5abgstzSU%3D&amp;reserved=0
>>>>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-16 18:26                                   ` Andrey Grodzovsky
@ 2020-12-16 23:15                                     ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-16 23:15 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher,
	Christian König

On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
> On 12/16/20 12:12 PM, Daniel Vetter wrote:
> > On Wed, Dec 16, 2020 at 5:18 PM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> >>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
> >>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> >>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> >>>>>> [SNIP]
> >>>>>>>> While we can't control user application accesses to the mapped
> >>>>>>>> buffers explicitly and hence we use page fault rerouting
> >>>>>>>> I am thinking that in this  case we may be able to sprinkle
> >>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
> >>>>>>>> CPU access a DMA buffer from the kernel ?
> >>>>>>> Yes, I fear we are going to need that.
> >>>>>>>
> >>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
> >>>>>>>> memcpy ? Is there other places ?
> >>>>>>> Puh, good question. I have no idea.
> >>>>>>>
> >>>>>>>> Another point is that at this point the driver shouldn't access any
> >>>>>>>> such buffers as we are at the process finishing the device.
> >>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
> >>>>>>>> don't think there is anything else to do ?
> >>>>>>> Well there is a page fault handler for kernel mappings, but that one
> >>>>>>> just prints the stack trace into the system log and calls BUG(); :)
> >>>>>>>
> >>>>>>> Long story short we need to avoid any access to released pages after
> >>>>>>> unplug. No matter if it's from the kernel or userspace.
> >>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
> >>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
> >>>>>> the code
> >>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> >>>>>> sake of device to main memory access). Kernel page table is not
> >>>>>> touched
> >>>>>> until last bo refcount is dropped and the bo is released
> >>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> >>>>>> is both
> >>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> >>>>>> by ioremap. So as i see it, nothing will bad will happen after we
> >>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
> >>>>>> system memory pages backing GTT BOs are still mapped and not freed and
> >>>>>> for
> >>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
> >>>>>> page table since iounmap wasn't called yet.
> >>>>> The problem is the system pages would be freed and if we kernel driver
> >>>>> still happily write to them we are pretty much busted because we write
> >>>>> to freed up memory.
> >>>
> >>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> >>> release
> >>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> >>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
> >>> before bo->destroy which unmaps the pages from kernel page table ? Won't
> >>> we have end up writing to freed memory in this time interval ? Don't we
> >>> need to postpone pages freeing to after kernel page table unmapping ?
> >> BOs are only destroyed when there is a guarantee that nobody is
> >> accessing them any more.
> >>
> >> The problem here is that the pages as well as the VRAM can be
> >> immediately reused after the hotplug event.
> >>
> >>>
> >>>> Similar for vram, if this is actual hotunplug and then replug, there's
> >>>> going to be a different device behind the same mmio bar range most
> >>>> likely (the higher bridges all this have the same windows assigned),
> >>>
> >>> No idea how this actually works but if we haven't called iounmap yet
> >>> doesn't it mean that those physical ranges that are still mapped into
> >>> page
> >>> table should be reserved and cannot be reused for another
> >>> device ? As a guess, maybe another subrange from the higher bridge's
> >>> total
> >>> range will be allocated.
> >> Nope, the PCIe subsystem doesn't care about any ioremap still active for
> >> a range when it is hotplugged.
> >>
> >>>> and that's bad news if we keep using it for current drivers. So we
> >>>> really have to point all these cpu ptes to some other place.
> >>>
> >>> We can't just unmap it without syncing against any in kernel accesses
> >>> to those buffers
> >>> and since page faulting technique we use for user mapped buffers seems
> >>> to not be possible
> >>> for kernel mapped buffers I am not sure how to do it gracefully...
> >> We could try to replace the kmap with a dummy page under the hood, but
> >> that is extremely tricky.
> >>
> >> Especially since BOs which are just 1 page in size could point to the
> >> linear mapping directly.
> > I think it's just more work. Essentially
> > - convert as much as possible of the kernel mappings to vmap_local,
> > which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> > serve as a barrier, and ofc any new vmap needs to fail or hand out a
> > dummy mapping.
>
> Read those patches. I am not sure how this helps with protecting
> against accesses to released backing pages or IO physical ranges of BO
> which is already mapped during the unplug event ?

By eliminating such users, and replacing them with local maps which
are strictly bound in how long they can exist (and hence we can
serialize against them finishing in our hotunplug code). It doesn't
solve all your problems, but it's a tool to get there.
-Daniel

> Andrey
>
>
> > - handle fbcon somehow. I think shutting it all down should work out.
> > - worst case keep the system backing storage around for shared dma-buf
> > until the other non-dynamic driver releases it. for vram we require
> > dynamic importers (and maybe it wasn't such a bright idea to allow
> > pinning of importer buffers, might need to revisit that).
> >
> > Cheers, Daniel
> >
> >> Christian.
> >>
> >>> Andrey
> >>>
> >>>
> >>>> -Daniel
> >>>>
> >>>>> Christian.
> >>>>>
> >>>>>> I loaded the driver with vm_update_mode=3
> >>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> >>>>>> removing the device. I guess i can test it more by allocating GTT and
> >>>>>> VRAM BOs
> >>>>>> and trying to read/write to them after device is removed.
> >>>>>>
> >>>>>> Andrey
> >>>>>>
> >>>>>>
> >>>>>>> Regards,
> >>>>>>> Christian.
> >>>>>>>
> >>>>>>>> Andrey
> >>>>>> _______________________________________________
> >>>>>> amd-gfx mailing list
> >>>>>> amd-gfx@lists.freedesktop.org
> >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C37b4367cbdaa4133b01d08d8a1e5bf41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437355430007196%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=r0bIJS3HUDkFPqyFinAt4eahM%2BjF01DObZ5abgstzSU%3D&amp;reserved=0
> >>>>>>
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-16 23:15                                     ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-16 23:15 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Rob Herring, amd-gfx list, Greg KH, dri-devel, Anholt, Eric,
	Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland, Harry,
	Christian König, Lucas Stach

On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
> On 12/16/20 12:12 PM, Daniel Vetter wrote:
> > On Wed, Dec 16, 2020 at 5:18 PM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> >>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
> >>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> >>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> >>>>>> [SNIP]
> >>>>>>>> While we can't control user application accesses to the mapped
> >>>>>>>> buffers explicitly and hence we use page fault rerouting
> >>>>>>>> I am thinking that in this  case we may be able to sprinkle
> >>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
> >>>>>>>> CPU access a DMA buffer from the kernel ?
> >>>>>>> Yes, I fear we are going to need that.
> >>>>>>>
> >>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
> >>>>>>>> memcpy ? Is there other places ?
> >>>>>>> Puh, good question. I have no idea.
> >>>>>>>
> >>>>>>>> Another point is that at this point the driver shouldn't access any
> >>>>>>>> such buffers as we are at the process finishing the device.
> >>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
> >>>>>>>> don't think there is anything else to do ?
> >>>>>>> Well there is a page fault handler for kernel mappings, but that one
> >>>>>>> just prints the stack trace into the system log and calls BUG(); :)
> >>>>>>>
> >>>>>>> Long story short we need to avoid any access to released pages after
> >>>>>>> unplug. No matter if it's from the kernel or userspace.
> >>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
> >>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
> >>>>>> the code
> >>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> >>>>>> sake of device to main memory access). Kernel page table is not
> >>>>>> touched
> >>>>>> until last bo refcount is dropped and the bo is released
> >>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> >>>>>> is both
> >>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> >>>>>> by ioremap. So as i see it, nothing will bad will happen after we
> >>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
> >>>>>> system memory pages backing GTT BOs are still mapped and not freed and
> >>>>>> for
> >>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
> >>>>>> page table since iounmap wasn't called yet.
> >>>>> The problem is the system pages would be freed and if we kernel driver
> >>>>> still happily write to them we are pretty much busted because we write
> >>>>> to freed up memory.
> >>>
> >>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> >>> release
> >>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> >>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
> >>> before bo->destroy which unmaps the pages from kernel page table ? Won't
> >>> we have end up writing to freed memory in this time interval ? Don't we
> >>> need to postpone pages freeing to after kernel page table unmapping ?
> >> BOs are only destroyed when there is a guarantee that nobody is
> >> accessing them any more.
> >>
> >> The problem here is that the pages as well as the VRAM can be
> >> immediately reused after the hotplug event.
> >>
> >>>
> >>>> Similar for vram, if this is actual hotunplug and then replug, there's
> >>>> going to be a different device behind the same mmio bar range most
> >>>> likely (the higher bridges all this have the same windows assigned),
> >>>
> >>> No idea how this actually works but if we haven't called iounmap yet
> >>> doesn't it mean that those physical ranges that are still mapped into
> >>> page
> >>> table should be reserved and cannot be reused for another
> >>> device ? As a guess, maybe another subrange from the higher bridge's
> >>> total
> >>> range will be allocated.
> >> Nope, the PCIe subsystem doesn't care about any ioremap still active for
> >> a range when it is hotplugged.
> >>
> >>>> and that's bad news if we keep using it for current drivers. So we
> >>>> really have to point all these cpu ptes to some other place.
> >>>
> >>> We can't just unmap it without syncing against any in kernel accesses
> >>> to those buffers
> >>> and since page faulting technique we use for user mapped buffers seems
> >>> to not be possible
> >>> for kernel mapped buffers I am not sure how to do it gracefully...
> >> We could try to replace the kmap with a dummy page under the hood, but
> >> that is extremely tricky.
> >>
> >> Especially since BOs which are just 1 page in size could point to the
> >> linear mapping directly.
> > I think it's just more work. Essentially
> > - convert as much as possible of the kernel mappings to vmap_local,
> > which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> > serve as a barrier, and ofc any new vmap needs to fail or hand out a
> > dummy mapping.
>
> Read those patches. I am not sure how this helps with protecting
> against accesses to released backing pages or IO physical ranges of BO
> which is already mapped during the unplug event ?

By eliminating such users, and replacing them with local maps which
are strictly bound in how long they can exist (and hence we can
serialize against them finishing in our hotunplug code). It doesn't
solve all your problems, but it's a tool to get there.
-Daniel

> Andrey
>
>
> > - handle fbcon somehow. I think shutting it all down should work out.
> > - worst case keep the system backing storage around for shared dma-buf
> > until the other non-dynamic driver releases it. for vram we require
> > dynamic importers (and maybe it wasn't such a bright idea to allow
> > pinning of importer buffers, might need to revisit that).
> >
> > Cheers, Daniel
> >
> >> Christian.
> >>
> >>> Andrey
> >>>
> >>>
> >>>> -Daniel
> >>>>
> >>>>> Christian.
> >>>>>
> >>>>>> I loaded the driver with vm_update_mode=3
> >>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> >>>>>> removing the device. I guess i can test it more by allocating GTT and
> >>>>>> VRAM BOs
> >>>>>> and trying to read/write to them after device is removed.
> >>>>>>
> >>>>>> Andrey
> >>>>>>
> >>>>>>
> >>>>>>> Regards,
> >>>>>>> Christian.
> >>>>>>>
> >>>>>>>> Andrey
> >>>>>> _______________________________________________
> >>>>>> amd-gfx mailing list
> >>>>>> amd-gfx@lists.freedesktop.org
> >>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C37b4367cbdaa4133b01d08d8a1e5bf41%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437355430007196%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=r0bIJS3HUDkFPqyFinAt4eahM%2BjF01DObZ5abgstzSU%3D&amp;reserved=0
> >>>>>>
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-16 23:15                                     ` Daniel Vetter
@ 2020-12-17  0:20                                       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17  0:20 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher,
	Christian König


On 12/16/20 6:15 PM, Daniel Vetter wrote:
> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> On 12/16/20 12:12 PM, Daniel Vetter wrote:
>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>>>>>> [SNIP]
>>>>>>>>>> While we can't control user application accesses to the mapped
>>>>>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>>>>> Yes, I fear we are going to need that.
>>>>>>>>>
>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>>>>>> memcpy ? Is there other places ?
>>>>>>>>> Puh, good question. I have no idea.
>>>>>>>>>
>>>>>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>>>>>> such buffers as we are at the process finishing the device.
>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>>>>>> don't think there is anything else to do ?
>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>>>>>
>>>>>>>>> Long story short we need to avoid any access to released pages after
>>>>>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>>>>>> the code
>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>>>>>> sake of device to main memory access). Kernel page table is not
>>>>>>>> touched
>>>>>>>> until last bo refcount is dropped and the bo is released
>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>>>>>> is both
>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>>>>>> for
>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>>>>>> page table since iounmap wasn't called yet.
>>>>>>> The problem is the system pages would be freed and if we kernel driver
>>>>>>> still happily write to them we are pretty much busted because we write
>>>>>>> to freed up memory.
>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
>>>>> release
>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
>>>>> we have end up writing to freed memory in this time interval ? Don't we
>>>>> need to postpone pages freeing to after kernel page table unmapping ?
>>>> BOs are only destroyed when there is a guarantee that nobody is
>>>> accessing them any more.
>>>>
>>>> The problem here is that the pages as well as the VRAM can be
>>>> immediately reused after the hotplug event.
>>>>
>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's
>>>>>> going to be a different device behind the same mmio bar range most
>>>>>> likely (the higher bridges all this have the same windows assigned),
>>>>> No idea how this actually works but if we haven't called iounmap yet
>>>>> doesn't it mean that those physical ranges that are still mapped into
>>>>> page
>>>>> table should be reserved and cannot be reused for another
>>>>> device ? As a guess, maybe another subrange from the higher bridge's
>>>>> total
>>>>> range will be allocated.
>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
>>>> a range when it is hotplugged.
>>>>
>>>>>> and that's bad news if we keep using it for current drivers. So we
>>>>>> really have to point all these cpu ptes to some other place.
>>>>> We can't just unmap it without syncing against any in kernel accesses
>>>>> to those buffers
>>>>> and since page faulting technique we use for user mapped buffers seems
>>>>> to not be possible
>>>>> for kernel mapped buffers I am not sure how to do it gracefully...
>>>> We could try to replace the kmap with a dummy page under the hood, but
>>>> that is extremely tricky.
>>>>
>>>> Especially since BOs which are just 1 page in size could point to the
>>>> linear mapping directly.
>>> I think it's just more work. Essentially
>>> - convert as much as possible of the kernel mappings to vmap_local,
>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a
>>> dummy mapping.
>> Read those patches. I am not sure how this helps with protecting
>> against accesses to released backing pages or IO physical ranges of BO
>> which is already mapped during the unplug event ?
> By eliminating such users, and replacing them with local maps which
> are strictly bound in how long they can exist (and hence we can
> serialize against them finishing in our hotunplug code).

Not sure I see how serializing against BO map/unmap helps -  our problem as you 
described is that once
device is extracted and then something else quickly takes it's place in the PCI 
topology
and gets assigned same physical IO ranges, then our driver will start accessing this
new device because our 'zombie' BOs are still pointing to those ranges.

Another point regarding serializing - problem  is that some of those BOs are 
very long lived, take for example the HW command
ring buffer Christian mentioned before - 
(amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
is basically for the entire time the device exists, it's destroyed only in the 
SW fini stage (when last drm_dev
reference is dropped) and so should I grab it's dma_resv_lock from 
amdgpu_pci_remove code and wait
for it to be unmapped before proceeding with the PCI remove code ? This can take 
unbound time and that why I don't understand
how serializing will help.

Andrey


> It doesn't
> solve all your problems, but it's a tool to get there.
> -Daniel
>
>> Andrey
>>
>>
>>> - handle fbcon somehow. I think shutting it all down should work out.
>>> - worst case keep the system backing storage around for shared dma-buf
>>> until the other non-dynamic driver releases it. for vram we require
>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>> pinning of importer buffers, might need to revisit that).
>>>
>>> Cheers, Daniel
>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>> VRAM BOs
>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C5849827698d1428065d408d8a2188518%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437573486129589%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uahs3RgQxLpUJ9hCzE8pvK9UWFCpyIz1i4MNKikl0tY%3D&amp;reserved=0
>>>>>>>>
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-17  0:20                                       ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17  0:20 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Herring, amd-gfx list, Greg KH, dri-devel, Anholt, Eric,
	Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland, Harry,
	Christian König, Lucas Stach


On 12/16/20 6:15 PM, Daniel Vetter wrote:
> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> On 12/16/20 12:12 PM, Daniel Vetter wrote:
>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>>>>>> [SNIP]
>>>>>>>>>> While we can't control user application accesses to the mapped
>>>>>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>>>>> Yes, I fear we are going to need that.
>>>>>>>>>
>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>>>>>> memcpy ? Is there other places ?
>>>>>>>>> Puh, good question. I have no idea.
>>>>>>>>>
>>>>>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>>>>>> such buffers as we are at the process finishing the device.
>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>>>>>> don't think there is anything else to do ?
>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>>>>>
>>>>>>>>> Long story short we need to avoid any access to released pages after
>>>>>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>>>>>> the code
>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>>>>>> sake of device to main memory access). Kernel page table is not
>>>>>>>> touched
>>>>>>>> until last bo refcount is dropped and the bo is released
>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>>>>>> is both
>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>>>>>> for
>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>>>>>> page table since iounmap wasn't called yet.
>>>>>>> The problem is the system pages would be freed and if we kernel driver
>>>>>>> still happily write to them we are pretty much busted because we write
>>>>>>> to freed up memory.
>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
>>>>> release
>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
>>>>> we have end up writing to freed memory in this time interval ? Don't we
>>>>> need to postpone pages freeing to after kernel page table unmapping ?
>>>> BOs are only destroyed when there is a guarantee that nobody is
>>>> accessing them any more.
>>>>
>>>> The problem here is that the pages as well as the VRAM can be
>>>> immediately reused after the hotplug event.
>>>>
>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's
>>>>>> going to be a different device behind the same mmio bar range most
>>>>>> likely (the higher bridges all this have the same windows assigned),
>>>>> No idea how this actually works but if we haven't called iounmap yet
>>>>> doesn't it mean that those physical ranges that are still mapped into
>>>>> page
>>>>> table should be reserved and cannot be reused for another
>>>>> device ? As a guess, maybe another subrange from the higher bridge's
>>>>> total
>>>>> range will be allocated.
>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
>>>> a range when it is hotplugged.
>>>>
>>>>>> and that's bad news if we keep using it for current drivers. So we
>>>>>> really have to point all these cpu ptes to some other place.
>>>>> We can't just unmap it without syncing against any in kernel accesses
>>>>> to those buffers
>>>>> and since page faulting technique we use for user mapped buffers seems
>>>>> to not be possible
>>>>> for kernel mapped buffers I am not sure how to do it gracefully...
>>>> We could try to replace the kmap with a dummy page under the hood, but
>>>> that is extremely tricky.
>>>>
>>>> Especially since BOs which are just 1 page in size could point to the
>>>> linear mapping directly.
>>> I think it's just more work. Essentially
>>> - convert as much as possible of the kernel mappings to vmap_local,
>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a
>>> dummy mapping.
>> Read those patches. I am not sure how this helps with protecting
>> against accesses to released backing pages or IO physical ranges of BO
>> which is already mapped during the unplug event ?
> By eliminating such users, and replacing them with local maps which
> are strictly bound in how long they can exist (and hence we can
> serialize against them finishing in our hotunplug code).

Not sure I see how serializing against BO map/unmap helps -  our problem as you 
described is that once
device is extracted and then something else quickly takes it's place in the PCI 
topology
and gets assigned same physical IO ranges, then our driver will start accessing this
new device because our 'zombie' BOs are still pointing to those ranges.

Another point regarding serializing - problem  is that some of those BOs are 
very long lived, take for example the HW command
ring buffer Christian mentioned before - 
(amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
is basically for the entire time the device exists, it's destroyed only in the 
SW fini stage (when last drm_dev
reference is dropped) and so should I grab it's dma_resv_lock from 
amdgpu_pci_remove code and wait
for it to be unmapped before proceeding with the PCI remove code ? This can take 
unbound time and that why I don't understand
how serializing will help.

Andrey


> It doesn't
> solve all your problems, but it's a tool to get there.
> -Daniel
>
>> Andrey
>>
>>
>>> - handle fbcon somehow. I think shutting it all down should work out.
>>> - worst case keep the system backing storage around for shared dma-buf
>>> until the other non-dynamic driver releases it. for vram we require
>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>> pinning of importer buffers, might need to revisit that).
>>>
>>> Cheers, Daniel
>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>> VRAM BOs
>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>> _______________________________________________
>>>>>>>> amd-gfx mailing list
>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C5849827698d1428065d408d8a2188518%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437573486129589%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uahs3RgQxLpUJ9hCzE8pvK9UWFCpyIz1i4MNKikl0tY%3D&amp;reserved=0
>>>>>>>>
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17  0:20                                       ` Andrey Grodzovsky
@ 2020-12-17 12:01                                         ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-17 12:01 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Daniel Vetter, dri-devel, amd-gfx list, Greg KH, Alex Deucher,
	Qiang Yu, Christian König

On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote:
> 
> On 12/16/20 6:15 PM, Daniel Vetter wrote:
> > On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky@amd.com> wrote:
> > > 
> > > On 12/16/20 12:12 PM, Daniel Vetter wrote:
> > > > On Wed, Dec 16, 2020 at 5:18 PM Christian König
> > > > <christian.koenig@amd.com> wrote:
> > > > > Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> > > > > > On 12/16/20 9:21 AM, Daniel Vetter wrote:
> > > > > > > On Wed, Dec 16, 2020 at 9:04 AM Christian König
> > > > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > > > > Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> > > > > > > > > [SNIP]
> > > > > > > > > > > While we can't control user application accesses to the mapped
> > > > > > > > > > > buffers explicitly and hence we use page fault rerouting
> > > > > > > > > > > I am thinking that in this  case we may be able to sprinkle
> > > > > > > > > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > > > > > > > > CPU access a DMA buffer from the kernel ?
> > > > > > > > > > Yes, I fear we are going to need that.
> > > > > > > > > > 
> > > > > > > > > > > Things like CPU page table updates, ring buffer accesses and FW
> > > > > > > > > > > memcpy ? Is there other places ?
> > > > > > > > > > Puh, good question. I have no idea.
> > > > > > > > > > 
> > > > > > > > > > > Another point is that at this point the driver shouldn't access any
> > > > > > > > > > > such buffers as we are at the process finishing the device.
> > > > > > > > > > > AFAIK there is no page fault mechanism for kernel mappings so I
> > > > > > > > > > > don't think there is anything else to do ?
> > > > > > > > > > Well there is a page fault handler for kernel mappings, but that one
> > > > > > > > > > just prints the stack trace into the system log and calls BUG(); :)
> > > > > > > > > > 
> > > > > > > > > > Long story short we need to avoid any access to released pages after
> > > > > > > > > > unplug. No matter if it's from the kernel or userspace.
> > > > > > > > > I was just about to start guarding with drm_dev_enter/exit CPU
> > > > > > > > > accesses from kernel to GTT ot VRAM buffers but then i looked more in
> > > > > > > > > the code
> > > > > > > > > and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> > > > > > > > > sake of device to main memory access). Kernel page table is not
> > > > > > > > > touched
> > > > > > > > > until last bo refcount is dropped and the bo is released
> > > > > > > > > (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> > > > > > > > > is both
> > > > > > > > > for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> > > > > > > > > by ioremap. So as i see it, nothing will bad will happen after we
> > > > > > > > > unpopulate a BO while we still try to use a kernel mapping for it,
> > > > > > > > > system memory pages backing GTT BOs are still mapped and not freed and
> > > > > > > > > for
> > > > > > > > > VRAM BOs same is for the IO physical ranges mapped into the kernel
> > > > > > > > > page table since iounmap wasn't called yet.
> > > > > > > > The problem is the system pages would be freed and if we kernel driver
> > > > > > > > still happily write to them we are pretty much busted because we write
> > > > > > > > to freed up memory.
> > > > > > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> > > > > > release
> > > > > > the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> > > > > > ttm_bo_cleanup_memtype_use which also leads to pages release comes
> > > > > > before bo->destroy which unmaps the pages from kernel page table ? Won't
> > > > > > we have end up writing to freed memory in this time interval ? Don't we
> > > > > > need to postpone pages freeing to after kernel page table unmapping ?
> > > > > BOs are only destroyed when there is a guarantee that nobody is
> > > > > accessing them any more.
> > > > > 
> > > > > The problem here is that the pages as well as the VRAM can be
> > > > > immediately reused after the hotplug event.
> > > > > 
> > > > > > > Similar for vram, if this is actual hotunplug and then replug, there's
> > > > > > > going to be a different device behind the same mmio bar range most
> > > > > > > likely (the higher bridges all this have the same windows assigned),
> > > > > > No idea how this actually works but if we haven't called iounmap yet
> > > > > > doesn't it mean that those physical ranges that are still mapped into
> > > > > > page
> > > > > > table should be reserved and cannot be reused for another
> > > > > > device ? As a guess, maybe another subrange from the higher bridge's
> > > > > > total
> > > > > > range will be allocated.
> > > > > Nope, the PCIe subsystem doesn't care about any ioremap still active for
> > > > > a range when it is hotplugged.
> > > > > 
> > > > > > > and that's bad news if we keep using it for current drivers. So we
> > > > > > > really have to point all these cpu ptes to some other place.
> > > > > > We can't just unmap it without syncing against any in kernel accesses
> > > > > > to those buffers
> > > > > > and since page faulting technique we use for user mapped buffers seems
> > > > > > to not be possible
> > > > > > for kernel mapped buffers I am not sure how to do it gracefully...
> > > > > We could try to replace the kmap with a dummy page under the hood, but
> > > > > that is extremely tricky.
> > > > > 
> > > > > Especially since BOs which are just 1 page in size could point to the
> > > > > linear mapping directly.
> > > > I think it's just more work. Essentially
> > > > - convert as much as possible of the kernel mappings to vmap_local,
> > > > which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> > > > serve as a barrier, and ofc any new vmap needs to fail or hand out a
> > > > dummy mapping.
> > > Read those patches. I am not sure how this helps with protecting
> > > against accesses to released backing pages or IO physical ranges of BO
> > > which is already mapped during the unplug event ?
> > By eliminating such users, and replacing them with local maps which
> > are strictly bound in how long they can exist (and hence we can
> > serialize against them finishing in our hotunplug code).
> 
> Not sure I see how serializing against BO map/unmap helps -  our problem as
> you described is that once
> device is extracted and then something else quickly takes it's place in the
> PCI topology
> and gets assigned same physical IO ranges, then our driver will start accessing this
> new device because our 'zombie' BOs are still pointing to those ranges.

Until your driver's remove callback is finished the ranges stay reserved.
If that's not the case, then hotunplug would be fundamentally impossible
ot handle correctly.

Of course all the mmio actions will time out, so it might take some time
to get through it all.

> Another point regarding serializing - problem  is that some of those BOs are
> very long lived, take for example the HW command
> ring buffer Christian mentioned before -
> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
> is basically for the entire time the device exists, it's destroyed only in
> the SW fini stage (when last drm_dev
> reference is dropped) and so should I grab it's dma_resv_lock from
> amdgpu_pci_remove code and wait
> for it to be unmapped before proceeding with the PCI remove code ? This can
> take unbound time and that why I don't understand
> how serializing will help.

Uh you need to untangle that. After hw cleanup is done no one is allowed
to touch that ringbuffer bo anymore from the kernel. That's what
drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
references to disappear.

The vmap_local is for mappings done by other drivers, through the dma-buf
interface (where "other drivers" can include fbdev/fbcon, if you use the
generic helpers).
-Daniel

> 
> Andrey
> 
> 
> > It doesn't
> > solve all your problems, but it's a tool to get there.
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > > - handle fbcon somehow. I think shutting it all down should work out.
> > > > - worst case keep the system backing storage around for shared dma-buf
> > > > until the other non-dynamic driver releases it. for vram we require
> > > > dynamic importers (and maybe it wasn't such a bright idea to allow
> > > > pinning of importer buffers, might need to revisit that).
> > > > 
> > > > Cheers, Daniel
> > > > 
> > > > > Christian.
> > > > > 
> > > > > > Andrey
> > > > > > 
> > > > > > 
> > > > > > > -Daniel
> > > > > > > 
> > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > > I loaded the driver with vm_update_mode=3
> > > > > > > > > meaning all VM updates done using CPU and hasn't seen any OOPs after
> > > > > > > > > removing the device. I guess i can test it more by allocating GTT and
> > > > > > > > > VRAM BOs
> > > > > > > > > and trying to read/write to them after device is removed.
> > > > > > > > > 
> > > > > > > > > Andrey
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > Regards,
> > > > > > > > > > Christian.
> > > > > > > > > > 
> > > > > > > > > > > Andrey
> > > > > > > > > _______________________________________________
> > > > > > > > > amd-gfx mailing list
> > > > > > > > > amd-gfx@lists.freedesktop.org
> > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C5849827698d1428065d408d8a2188518%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437573486129589%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uahs3RgQxLpUJ9hCzE8pvK9UWFCpyIz1i4MNKikl0tY%3D&amp;reserved=0
> > > > > > > > > 
> > 
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-17 12:01                                         ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-17 12:01 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Rob Herring, Daniel Vetter, dri-devel, Anholt, Eric,
	Pekka Paalanen, amd-gfx list, Greg KH, Alex Deucher, Qiang Yu,
	Wentland, Harry, Christian König, Lucas Stach

On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote:
> 
> On 12/16/20 6:15 PM, Daniel Vetter wrote:
> > On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky@amd.com> wrote:
> > > 
> > > On 12/16/20 12:12 PM, Daniel Vetter wrote:
> > > > On Wed, Dec 16, 2020 at 5:18 PM Christian König
> > > > <christian.koenig@amd.com> wrote:
> > > > > Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> > > > > > On 12/16/20 9:21 AM, Daniel Vetter wrote:
> > > > > > > On Wed, Dec 16, 2020 at 9:04 AM Christian König
> > > > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > > > > Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> > > > > > > > > [SNIP]
> > > > > > > > > > > While we can't control user application accesses to the mapped
> > > > > > > > > > > buffers explicitly and hence we use page fault rerouting
> > > > > > > > > > > I am thinking that in this  case we may be able to sprinkle
> > > > > > > > > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > > > > > > > > CPU access a DMA buffer from the kernel ?
> > > > > > > > > > Yes, I fear we are going to need that.
> > > > > > > > > > 
> > > > > > > > > > > Things like CPU page table updates, ring buffer accesses and FW
> > > > > > > > > > > memcpy ? Is there other places ?
> > > > > > > > > > Puh, good question. I have no idea.
> > > > > > > > > > 
> > > > > > > > > > > Another point is that at this point the driver shouldn't access any
> > > > > > > > > > > such buffers as we are at the process finishing the device.
> > > > > > > > > > > AFAIK there is no page fault mechanism for kernel mappings so I
> > > > > > > > > > > don't think there is anything else to do ?
> > > > > > > > > > Well there is a page fault handler for kernel mappings, but that one
> > > > > > > > > > just prints the stack trace into the system log and calls BUG(); :)
> > > > > > > > > > 
> > > > > > > > > > Long story short we need to avoid any access to released pages after
> > > > > > > > > > unplug. No matter if it's from the kernel or userspace.
> > > > > > > > > I was just about to start guarding with drm_dev_enter/exit CPU
> > > > > > > > > accesses from kernel to GTT ot VRAM buffers but then i looked more in
> > > > > > > > > the code
> > > > > > > > > and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> > > > > > > > > sake of device to main memory access). Kernel page table is not
> > > > > > > > > touched
> > > > > > > > > until last bo refcount is dropped and the bo is released
> > > > > > > > > (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> > > > > > > > > is both
> > > > > > > > > for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> > > > > > > > > by ioremap. So as i see it, nothing will bad will happen after we
> > > > > > > > > unpopulate a BO while we still try to use a kernel mapping for it,
> > > > > > > > > system memory pages backing GTT BOs are still mapped and not freed and
> > > > > > > > > for
> > > > > > > > > VRAM BOs same is for the IO physical ranges mapped into the kernel
> > > > > > > > > page table since iounmap wasn't called yet.
> > > > > > > > The problem is the system pages would be freed and if we kernel driver
> > > > > > > > still happily write to them we are pretty much busted because we write
> > > > > > > > to freed up memory.
> > > > > > OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> > > > > > release
> > > > > > the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> > > > > > ttm_bo_cleanup_memtype_use which also leads to pages release comes
> > > > > > before bo->destroy which unmaps the pages from kernel page table ? Won't
> > > > > > we have end up writing to freed memory in this time interval ? Don't we
> > > > > > need to postpone pages freeing to after kernel page table unmapping ?
> > > > > BOs are only destroyed when there is a guarantee that nobody is
> > > > > accessing them any more.
> > > > > 
> > > > > The problem here is that the pages as well as the VRAM can be
> > > > > immediately reused after the hotplug event.
> > > > > 
> > > > > > > Similar for vram, if this is actual hotunplug and then replug, there's
> > > > > > > going to be a different device behind the same mmio bar range most
> > > > > > > likely (the higher bridges all this have the same windows assigned),
> > > > > > No idea how this actually works but if we haven't called iounmap yet
> > > > > > doesn't it mean that those physical ranges that are still mapped into
> > > > > > page
> > > > > > table should be reserved and cannot be reused for another
> > > > > > device ? As a guess, maybe another subrange from the higher bridge's
> > > > > > total
> > > > > > range will be allocated.
> > > > > Nope, the PCIe subsystem doesn't care about any ioremap still active for
> > > > > a range when it is hotplugged.
> > > > > 
> > > > > > > and that's bad news if we keep using it for current drivers. So we
> > > > > > > really have to point all these cpu ptes to some other place.
> > > > > > We can't just unmap it without syncing against any in kernel accesses
> > > > > > to those buffers
> > > > > > and since page faulting technique we use for user mapped buffers seems
> > > > > > to not be possible
> > > > > > for kernel mapped buffers I am not sure how to do it gracefully...
> > > > > We could try to replace the kmap with a dummy page under the hood, but
> > > > > that is extremely tricky.
> > > > > 
> > > > > Especially since BOs which are just 1 page in size could point to the
> > > > > linear mapping directly.
> > > > I think it's just more work. Essentially
> > > > - convert as much as possible of the kernel mappings to vmap_local,
> > > > which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> > > > serve as a barrier, and ofc any new vmap needs to fail or hand out a
> > > > dummy mapping.
> > > Read those patches. I am not sure how this helps with protecting
> > > against accesses to released backing pages or IO physical ranges of BO
> > > which is already mapped during the unplug event ?
> > By eliminating such users, and replacing them with local maps which
> > are strictly bound in how long they can exist (and hence we can
> > serialize against them finishing in our hotunplug code).
> 
> Not sure I see how serializing against BO map/unmap helps -  our problem as
> you described is that once
> device is extracted and then something else quickly takes it's place in the
> PCI topology
> and gets assigned same physical IO ranges, then our driver will start accessing this
> new device because our 'zombie' BOs are still pointing to those ranges.

Until your driver's remove callback is finished the ranges stay reserved.
If that's not the case, then hotunplug would be fundamentally impossible
ot handle correctly.

Of course all the mmio actions will time out, so it might take some time
to get through it all.

> Another point regarding serializing - problem  is that some of those BOs are
> very long lived, take for example the HW command
> ring buffer Christian mentioned before -
> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
> is basically for the entire time the device exists, it's destroyed only in
> the SW fini stage (when last drm_dev
> reference is dropped) and so should I grab it's dma_resv_lock from
> amdgpu_pci_remove code and wait
> for it to be unmapped before proceeding with the PCI remove code ? This can
> take unbound time and that why I don't understand
> how serializing will help.

Uh you need to untangle that. After hw cleanup is done no one is allowed
to touch that ringbuffer bo anymore from the kernel. That's what
drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
references to disappear.

The vmap_local is for mappings done by other drivers, through the dma-buf
interface (where "other drivers" can include fbdev/fbcon, if you use the
generic helpers).
-Daniel

> 
> Andrey
> 
> 
> > It doesn't
> > solve all your problems, but it's a tool to get there.
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > > - handle fbcon somehow. I think shutting it all down should work out.
> > > > - worst case keep the system backing storage around for shared dma-buf
> > > > until the other non-dynamic driver releases it. for vram we require
> > > > dynamic importers (and maybe it wasn't such a bright idea to allow
> > > > pinning of importer buffers, might need to revisit that).
> > > > 
> > > > Cheers, Daniel
> > > > 
> > > > > Christian.
> > > > > 
> > > > > > Andrey
> > > > > > 
> > > > > > 
> > > > > > > -Daniel
> > > > > > > 
> > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > > I loaded the driver with vm_update_mode=3
> > > > > > > > > meaning all VM updates done using CPU and hasn't seen any OOPs after
> > > > > > > > > removing the device. I guess i can test it more by allocating GTT and
> > > > > > > > > VRAM BOs
> > > > > > > > > and trying to read/write to them after device is removed.
> > > > > > > > > 
> > > > > > > > > Andrey
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > Regards,
> > > > > > > > > > Christian.
> > > > > > > > > > 
> > > > > > > > > > > Andrey
> > > > > > > > > _______________________________________________
> > > > > > > > > amd-gfx mailing list
> > > > > > > > > amd-gfx@lists.freedesktop.org
> > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C5849827698d1428065d408d8a2188518%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637437573486129589%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=uahs3RgQxLpUJ9hCzE8pvK9UWFCpyIz1i4MNKikl0tY%3D&amp;reserved=0
> > > > > > > > > 
> > 
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17 12:01                                         ` Daniel Vetter
@ 2020-12-17 19:19                                           ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17 19:19 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Daniel Vetter, dri-devel, amd-gfx list, Greg KH, Alex Deucher,
	Qiang Yu, Christian König


On 12/17/20 7:01 AM, Daniel Vetter wrote:
> On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote:
>> On 12/16/20 6:15 PM, Daniel Vetter wrote:
>>> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
>>> <Andrey.Grodzovsky@amd.com> wrote:
>>>> On 12/16/20 12:12 PM, Daniel Vetter wrote:
>>>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König
>>>>> <christian.koenig@amd.com> wrote:
>>>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>>>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>>>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>>>>>>>> [SNIP]
>>>>>>>>>>>> While we can't control user application accesses to the mapped
>>>>>>>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>>>>>>> Yes, I fear we are going to need that.
>>>>>>>>>>>
>>>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>>>>>>>> memcpy ? Is there other places ?
>>>>>>>>>>> Puh, good question. I have no idea.
>>>>>>>>>>>
>>>>>>>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>>>>>>>> such buffers as we are at the process finishing the device.
>>>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>>>>>>>> don't think there is anything else to do ?
>>>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>>>>>>>
>>>>>>>>>>> Long story short we need to avoid any access to released pages after
>>>>>>>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>>>>>>>> the code
>>>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>>>>>>>> sake of device to main memory access). Kernel page table is not
>>>>>>>>>> touched
>>>>>>>>>> until last bo refcount is dropped and the bo is released
>>>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>>>>>>>> is both
>>>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>>>>>>>> for
>>>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>>>>>>>> page table since iounmap wasn't called yet.
>>>>>>>>> The problem is the system pages would be freed and if we kernel driver
>>>>>>>>> still happily write to them we are pretty much busted because we write
>>>>>>>>> to freed up memory.
>>>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
>>>>>>> release
>>>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
>>>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
>>>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
>>>>>>> we have end up writing to freed memory in this time interval ? Don't we
>>>>>>> need to postpone pages freeing to after kernel page table unmapping ?
>>>>>> BOs are only destroyed when there is a guarantee that nobody is
>>>>>> accessing them any more.
>>>>>>
>>>>>> The problem here is that the pages as well as the VRAM can be
>>>>>> immediately reused after the hotplug event.
>>>>>>
>>>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's
>>>>>>>> going to be a different device behind the same mmio bar range most
>>>>>>>> likely (the higher bridges all this have the same windows assigned),
>>>>>>> No idea how this actually works but if we haven't called iounmap yet
>>>>>>> doesn't it mean that those physical ranges that are still mapped into
>>>>>>> page
>>>>>>> table should be reserved and cannot be reused for another
>>>>>>> device ? As a guess, maybe another subrange from the higher bridge's
>>>>>>> total
>>>>>>> range will be allocated.
>>>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
>>>>>> a range when it is hotplugged.
>>>>>>
>>>>>>>> and that's bad news if we keep using it for current drivers. So we
>>>>>>>> really have to point all these cpu ptes to some other place.
>>>>>>> We can't just unmap it without syncing against any in kernel accesses
>>>>>>> to those buffers
>>>>>>> and since page faulting technique we use for user mapped buffers seems
>>>>>>> to not be possible
>>>>>>> for kernel mapped buffers I am not sure how to do it gracefully...
>>>>>> We could try to replace the kmap with a dummy page under the hood, but
>>>>>> that is extremely tricky.
>>>>>>
>>>>>> Especially since BOs which are just 1 page in size could point to the
>>>>>> linear mapping directly.
>>>>> I think it's just more work. Essentially
>>>>> - convert as much as possible of the kernel mappings to vmap_local,
>>>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
>>>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a
>>>>> dummy mapping.
>>>> Read those patches. I am not sure how this helps with protecting
>>>> against accesses to released backing pages or IO physical ranges of BO
>>>> which is already mapped during the unplug event ?
>>> By eliminating such users, and replacing them with local maps which
>>> are strictly bound in how long they can exist (and hence we can
>>> serialize against them finishing in our hotunplug code).
>> Not sure I see how serializing against BO map/unmap helps -  our problem as
>> you described is that once
>> device is extracted and then something else quickly takes it's place in the
>> PCI topology
>> and gets assigned same physical IO ranges, then our driver will start accessing this
>> new device because our 'zombie' BOs are still pointing to those ranges.
> Until your driver's remove callback is finished the ranges stay reserved.


The ranges stay reserved until unmapped which happens in bo->destroy
which for most internally allocated  buffers is during sw_fini when last drm_put
is called.


> If that's not the case, then hotunplug would be fundamentally impossible
> ot handle correctly.
>
> Of course all the mmio actions will time out, so it might take some time
> to get through it all.


I found that PCI code provides pci_device_is_present function
we can use to avoid timeouts - it reads device vendor and checks if all 1s is 
returned
or not. We can call it from within register accessors before trying read/write


>
>> Another point regarding serializing - problem  is that some of those BOs are
>> very long lived, take for example the HW command
>> ring buffer Christian mentioned before -
>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>> is basically for the entire time the device exists, it's destroyed only in
>> the SW fini stage (when last drm_dev
>> reference is dropped) and so should I grab it's dma_resv_lock from
>> amdgpu_pci_remove code and wait
>> for it to be unmapped before proceeding with the PCI remove code ? This can
>> take unbound time and that why I don't understand
>> how serializing will help.
> Uh you need to untangle that. After hw cleanup is done no one is allowed
> to touch that ringbuffer bo anymore from the kernel.


I would assume we are not allowed to touch it once we identified the device is
gone in order to minimize the chance of accidental writes to some other device 
which might now
occupy those IO ranges ?


>   That's what
> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
> references to disappear.


Yes, didn't make sense to me why would we use vmap_local for internally
allocated buffers. I think we should also guard registers read/writes for the
same reason as above.


>
> The vmap_local is for mappings done by other drivers, through the dma-buf
> interface (where "other drivers" can include fbdev/fbcon, if you use the
> generic helpers).
> -Daniel


Ok, so I assumed that with vmap_local you were trying to solve the problem of 
quick reinsertion
of another device into same MMIO range that my driver still points too but 
actually are you trying to solve
the issue of exported dma buffers outliving the device ? For this we have 
drm_device refcount in the GEM layer
i think.

Andrey


>
>> Andrey
>>
>>
>>> It doesn't
>>> solve all your problems, but it's a tool to get there.
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>> - handle fbcon somehow. I think shutting it all down should work out.
>>>>> - worst case keep the system backing storage around for shared dma-buf
>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>> pinning of importer buffers, might need to revisit that).
>>>>>
>>>>> Cheers, Daniel
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>>>> VRAM BOs
>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>> _______________________________________________
>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0
>>>>>>>>>>
>>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-17 19:19                                           ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17 19:19 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Herring, Daniel Vetter, dri-devel, Anholt, Eric,
	Pekka Paalanen, amd-gfx list, Greg KH, Alex Deucher, Qiang Yu,
	Wentland, Harry, Christian König, Lucas Stach


On 12/17/20 7:01 AM, Daniel Vetter wrote:
> On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote:
>> On 12/16/20 6:15 PM, Daniel Vetter wrote:
>>> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
>>> <Andrey.Grodzovsky@amd.com> wrote:
>>>> On 12/16/20 12:12 PM, Daniel Vetter wrote:
>>>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König
>>>>> <christian.koenig@amd.com> wrote:
>>>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>>>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>>>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>>>>>>>> [SNIP]
>>>>>>>>>>>> While we can't control user application accesses to the mapped
>>>>>>>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>>>>>>> Yes, I fear we are going to need that.
>>>>>>>>>>>
>>>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>>>>>>>> memcpy ? Is there other places ?
>>>>>>>>>>> Puh, good question. I have no idea.
>>>>>>>>>>>
>>>>>>>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>>>>>>>> such buffers as we are at the process finishing the device.
>>>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>>>>>>>> don't think there is anything else to do ?
>>>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>>>>>>>
>>>>>>>>>>> Long story short we need to avoid any access to released pages after
>>>>>>>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>>>>>>>> the code
>>>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>>>>>>>> sake of device to main memory access). Kernel page table is not
>>>>>>>>>> touched
>>>>>>>>>> until last bo refcount is dropped and the bo is released
>>>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>>>>>>>> is both
>>>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>>>>>>>> for
>>>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>>>>>>>> page table since iounmap wasn't called yet.
>>>>>>>>> The problem is the system pages would be freed and if we kernel driver
>>>>>>>>> still happily write to them we are pretty much busted because we write
>>>>>>>>> to freed up memory.
>>>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
>>>>>>> release
>>>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
>>>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
>>>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
>>>>>>> we have end up writing to freed memory in this time interval ? Don't we
>>>>>>> need to postpone pages freeing to after kernel page table unmapping ?
>>>>>> BOs are only destroyed when there is a guarantee that nobody is
>>>>>> accessing them any more.
>>>>>>
>>>>>> The problem here is that the pages as well as the VRAM can be
>>>>>> immediately reused after the hotplug event.
>>>>>>
>>>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's
>>>>>>>> going to be a different device behind the same mmio bar range most
>>>>>>>> likely (the higher bridges all this have the same windows assigned),
>>>>>>> No idea how this actually works but if we haven't called iounmap yet
>>>>>>> doesn't it mean that those physical ranges that are still mapped into
>>>>>>> page
>>>>>>> table should be reserved and cannot be reused for another
>>>>>>> device ? As a guess, maybe another subrange from the higher bridge's
>>>>>>> total
>>>>>>> range will be allocated.
>>>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
>>>>>> a range when it is hotplugged.
>>>>>>
>>>>>>>> and that's bad news if we keep using it for current drivers. So we
>>>>>>>> really have to point all these cpu ptes to some other place.
>>>>>>> We can't just unmap it without syncing against any in kernel accesses
>>>>>>> to those buffers
>>>>>>> and since page faulting technique we use for user mapped buffers seems
>>>>>>> to not be possible
>>>>>>> for kernel mapped buffers I am not sure how to do it gracefully...
>>>>>> We could try to replace the kmap with a dummy page under the hood, but
>>>>>> that is extremely tricky.
>>>>>>
>>>>>> Especially since BOs which are just 1 page in size could point to the
>>>>>> linear mapping directly.
>>>>> I think it's just more work. Essentially
>>>>> - convert as much as possible of the kernel mappings to vmap_local,
>>>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
>>>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a
>>>>> dummy mapping.
>>>> Read those patches. I am not sure how this helps with protecting
>>>> against accesses to released backing pages or IO physical ranges of BO
>>>> which is already mapped during the unplug event ?
>>> By eliminating such users, and replacing them with local maps which
>>> are strictly bound in how long they can exist (and hence we can
>>> serialize against them finishing in our hotunplug code).
>> Not sure I see how serializing against BO map/unmap helps -  our problem as
>> you described is that once
>> device is extracted and then something else quickly takes it's place in the
>> PCI topology
>> and gets assigned same physical IO ranges, then our driver will start accessing this
>> new device because our 'zombie' BOs are still pointing to those ranges.
> Until your driver's remove callback is finished the ranges stay reserved.


The ranges stay reserved until unmapped which happens in bo->destroy
which for most internally allocated  buffers is during sw_fini when last drm_put
is called.


> If that's not the case, then hotunplug would be fundamentally impossible
> ot handle correctly.
>
> Of course all the mmio actions will time out, so it might take some time
> to get through it all.


I found that PCI code provides pci_device_is_present function
we can use to avoid timeouts - it reads device vendor and checks if all 1s is 
returned
or not. We can call it from within register accessors before trying read/write


>
>> Another point regarding serializing - problem  is that some of those BOs are
>> very long lived, take for example the HW command
>> ring buffer Christian mentioned before -
>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>> is basically for the entire time the device exists, it's destroyed only in
>> the SW fini stage (when last drm_dev
>> reference is dropped) and so should I grab it's dma_resv_lock from
>> amdgpu_pci_remove code and wait
>> for it to be unmapped before proceeding with the PCI remove code ? This can
>> take unbound time and that why I don't understand
>> how serializing will help.
> Uh you need to untangle that. After hw cleanup is done no one is allowed
> to touch that ringbuffer bo anymore from the kernel.


I would assume we are not allowed to touch it once we identified the device is
gone in order to minimize the chance of accidental writes to some other device 
which might now
occupy those IO ranges ?


>   That's what
> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
> references to disappear.


Yes, didn't make sense to me why would we use vmap_local for internally
allocated buffers. I think we should also guard registers read/writes for the
same reason as above.


>
> The vmap_local is for mappings done by other drivers, through the dma-buf
> interface (where "other drivers" can include fbdev/fbcon, if you use the
> generic helpers).
> -Daniel


Ok, so I assumed that with vmap_local you were trying to solve the problem of 
quick reinsertion
of another device into same MMIO range that my driver still points too but 
actually are you trying to solve
the issue of exported dma buffers outliving the device ? For this we have 
drm_device refcount in the GEM layer
i think.

Andrey


>
>> Andrey
>>
>>
>>> It doesn't
>>> solve all your problems, but it's a tool to get there.
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>> - handle fbcon somehow. I think shutting it all down should work out.
>>>>> - worst case keep the system backing storage around for shared dma-buf
>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>> pinning of importer buffers, might need to revisit that).
>>>>>
>>>>> Cheers, Daniel
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>>>> VRAM BOs
>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>> _______________________________________________
>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0
>>>>>>>>>>
>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17 19:19                                           ` Andrey Grodzovsky
@ 2020-12-17 20:10                                             ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-12-17 20:10 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: amd-gfx list, Daniel Vetter, dri-devel, Qiang Yu, Greg KH, Alex Deucher

[SNIP]
>>> By eliminating such users, and replacing them with local maps which
>>>> are strictly bound in how long they can exist (and hence we can
>>>> serialize against them finishing in our hotunplug code).
>>> Not sure I see how serializing against BO map/unmap helps - our 
>>> problem as
>>> you described is that once
>>> device is extracted and then something else quickly takes it's place 
>>> in the
>>> PCI topology
>>> and gets assigned same physical IO ranges, then our driver will 
>>> start accessing this
>>> new device because our 'zombie' BOs are still pointing to those ranges.
>> Until your driver's remove callback is finished the ranges stay 
>> reserved.
>
>
> The ranges stay reserved until unmapped which happens in bo->destroy

I'm not sure of that. Why do you think that?

> which for most internally allocated  buffers is during sw_fini when 
> last drm_put
> is called.
>
>
>> If that's not the case, then hotunplug would be fundamentally impossible
>> ot handle correctly.
>>
>> Of course all the mmio actions will time out, so it might take some time
>> to get through it all.
>
>
> I found that PCI code provides pci_device_is_present function
> we can use to avoid timeouts - it reads device vendor and checks if 
> all 1s is returned
> or not. We can call it from within register accessors before trying 
> read/write

That's way to much overhead! We need to keep that much lower or it will 
result in quite a performance drop.

I suggest to rather think about adding drm_dev_enter/exit guards.

Christian.

>
>>> Another point regarding serializing - problem  is that some of those 
>>> BOs are
>>> very long lived, take for example the HW command
>>> ring buffer Christian mentioned before -
>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>>> is basically for the entire time the device exists, it's destroyed 
>>> only in
>>> the SW fini stage (when last drm_dev
>>> reference is dropped) and so should I grab it's dma_resv_lock from
>>> amdgpu_pci_remove code and wait
>>> for it to be unmapped before proceeding with the PCI remove code ? 
>>> This can
>>> take unbound time and that why I don't understand
>>> how serializing will help.
>> Uh you need to untangle that. After hw cleanup is done no one is allowed
>> to touch that ringbuffer bo anymore from the kernel.
>
>
> I would assume we are not allowed to touch it once we identified the 
> device is
> gone in order to minimize the chance of accidental writes to some 
> other device which might now
> occupy those IO ranges ?
>
>
>>   That's what
>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
>> references to disappear.
>
>
> Yes, didn't make sense to me why would we use vmap_local for internally
> allocated buffers. I think we should also guard registers read/writes 
> for the
> same reason as above.
>
>
>>
>> The vmap_local is for mappings done by other drivers, through the 
>> dma-buf
>> interface (where "other drivers" can include fbdev/fbcon, if you use the
>> generic helpers).
>> -Daniel
>
>
> Ok, so I assumed that with vmap_local you were trying to solve the 
> problem of quick reinsertion
> of another device into same MMIO range that my driver still points too 
> but actually are you trying to solve
> the issue of exported dma buffers outliving the device ? For this we 
> have drm_device refcount in the GEM layer
> i think.
>
> Andrey
>
>
>>
>>> Andrey
>>>
>>>
>>>> It doesn't
>>>> solve all your problems, but it's a tool to get there.
>>>> -Daniel
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> - handle fbcon somehow. I think shutting it all down should work 
>>>>>> out.
>>>>>> - worst case keep the system backing storage around for shared 
>>>>>> dma-buf
>>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>>> pinning of importer buffers, might need to revisit that).
>>>>>>
>>>>>> Cheers, Daniel
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>> -Daniel
>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any 
>>>>>>>>>>> OOPs after
>>>>>>>>>>> removing the device. I guess i can test it more by 
>>>>>>>>>>> allocating GTT and
>>>>>>>>>>> VRAM BOs
>>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-17 20:10                                             ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2020-12-17 20:10 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: Rob Herring, amd-gfx list, Daniel Vetter, dri-devel, Anholt,
	Eric, Pekka Paalanen, Qiang Yu, Greg KH, Alex Deucher, Wentland,
	Harry, Lucas Stach

[SNIP]
>>> By eliminating such users, and replacing them with local maps which
>>>> are strictly bound in how long they can exist (and hence we can
>>>> serialize against them finishing in our hotunplug code).
>>> Not sure I see how serializing against BO map/unmap helps - our 
>>> problem as
>>> you described is that once
>>> device is extracted and then something else quickly takes it's place 
>>> in the
>>> PCI topology
>>> and gets assigned same physical IO ranges, then our driver will 
>>> start accessing this
>>> new device because our 'zombie' BOs are still pointing to those ranges.
>> Until your driver's remove callback is finished the ranges stay 
>> reserved.
>
>
> The ranges stay reserved until unmapped which happens in bo->destroy

I'm not sure of that. Why do you think that?

> which for most internally allocated  buffers is during sw_fini when 
> last drm_put
> is called.
>
>
>> If that's not the case, then hotunplug would be fundamentally impossible
>> ot handle correctly.
>>
>> Of course all the mmio actions will time out, so it might take some time
>> to get through it all.
>
>
> I found that PCI code provides pci_device_is_present function
> we can use to avoid timeouts - it reads device vendor and checks if 
> all 1s is returned
> or not. We can call it from within register accessors before trying 
> read/write

That's way to much overhead! We need to keep that much lower or it will 
result in quite a performance drop.

I suggest to rather think about adding drm_dev_enter/exit guards.

Christian.

>
>>> Another point regarding serializing - problem  is that some of those 
>>> BOs are
>>> very long lived, take for example the HW command
>>> ring buffer Christian mentioned before -
>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>>> is basically for the entire time the device exists, it's destroyed 
>>> only in
>>> the SW fini stage (when last drm_dev
>>> reference is dropped) and so should I grab it's dma_resv_lock from
>>> amdgpu_pci_remove code and wait
>>> for it to be unmapped before proceeding with the PCI remove code ? 
>>> This can
>>> take unbound time and that why I don't understand
>>> how serializing will help.
>> Uh you need to untangle that. After hw cleanup is done no one is allowed
>> to touch that ringbuffer bo anymore from the kernel.
>
>
> I would assume we are not allowed to touch it once we identified the 
> device is
> gone in order to minimize the chance of accidental writes to some 
> other device which might now
> occupy those IO ranges ?
>
>
>>   That's what
>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
>> references to disappear.
>
>
> Yes, didn't make sense to me why would we use vmap_local for internally
> allocated buffers. I think we should also guard registers read/writes 
> for the
> same reason as above.
>
>
>>
>> The vmap_local is for mappings done by other drivers, through the 
>> dma-buf
>> interface (where "other drivers" can include fbdev/fbcon, if you use the
>> generic helpers).
>> -Daniel
>
>
> Ok, so I assumed that with vmap_local you were trying to solve the 
> problem of quick reinsertion
> of another device into same MMIO range that my driver still points too 
> but actually are you trying to solve
> the issue of exported dma buffers outliving the device ? For this we 
> have drm_device refcount in the GEM layer
> i think.
>
> Andrey
>
>
>>
>>> Andrey
>>>
>>>
>>>> It doesn't
>>>> solve all your problems, but it's a tool to get there.
>>>> -Daniel
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>> - handle fbcon somehow. I think shutting it all down should work 
>>>>>> out.
>>>>>> - worst case keep the system backing storage around for shared 
>>>>>> dma-buf
>>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>>> pinning of importer buffers, might need to revisit that).
>>>>>>
>>>>>> Cheers, Daniel
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>> -Daniel
>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any 
>>>>>>>>>>> OOPs after
>>>>>>>>>>> removing the device. I guess i can test it more by 
>>>>>>>>>>> allocating GTT and
>>>>>>>>>>> VRAM BOs
>>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Christian.
>>>>>>>>>>>>
>>>>>>>>>>>>> Andrey
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0 
>>>>>>>>>>>
>>>>>>>>>>>
>>>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17 20:10                                             ` Christian König
@ 2020-12-17 20:38                                               ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17 20:38 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: amd-gfx list, Daniel Vetter, dri-devel, Qiang Yu, Greg KH, Alex Deucher


On 12/17/20 3:10 PM, Christian König wrote:
> [SNIP]
>>>> By eliminating such users, and replacing them with local maps which
>>>>> are strictly bound in how long they can exist (and hence we can
>>>>> serialize against them finishing in our hotunplug code).
>>>> Not sure I see how serializing against BO map/unmap helps - our problem as
>>>> you described is that once
>>>> device is extracted and then something else quickly takes it's place in the
>>>> PCI topology
>>>> and gets assigned same physical IO ranges, then our driver will start 
>>>> accessing this
>>>> new device because our 'zombie' BOs are still pointing to those ranges.
>>> Until your driver's remove callback is finished the ranges stay reserved.
>>
>>
>> The ranges stay reserved until unmapped which happens in bo->destroy
>
> I'm not sure of that. Why do you think that?


Because of this sequence 
ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap
Is there another place I am missing ?


>
>> which for most internally allocated buffers is during sw_fini when last drm_put
>> is called.
>>
>>
>>> If that's not the case, then hotunplug would be fundamentally impossible
>>> ot handle correctly.
>>>
>>> Of course all the mmio actions will time out, so it might take some time
>>> to get through it all.
>>
>>
>> I found that PCI code provides pci_device_is_present function
>> we can use to avoid timeouts - it reads device vendor and checks if all 1s is 
>> returned
>> or not. We can call it from within register accessors before trying read/write
>
> That's way to much overhead! We need to keep that much lower or it will result 
> in quite a performance drop.
>
> I suggest to rather think about adding drm_dev_enter/exit guards.


Sure, this one is just a bit upstream to the disconnect event. Eventually none 
of them is watertight.

Andrey


>
> Christian.
>
>>
>>>> Another point regarding serializing - problem  is that some of those BOs are
>>>> very long lived, take for example the HW command
>>>> ring buffer Christian mentioned before -
>>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>>>> is basically for the entire time the device exists, it's destroyed only in
>>>> the SW fini stage (when last drm_dev
>>>> reference is dropped) and so should I grab it's dma_resv_lock from
>>>> amdgpu_pci_remove code and wait
>>>> for it to be unmapped before proceeding with the PCI remove code ? This can
>>>> take unbound time and that why I don't understand
>>>> how serializing will help.
>>> Uh you need to untangle that. After hw cleanup is done no one is allowed
>>> to touch that ringbuffer bo anymore from the kernel.
>>
>>
>> I would assume we are not allowed to touch it once we identified the device is
>> gone in order to minimize the chance of accidental writes to some other 
>> device which might now
>> occupy those IO ranges ?
>>
>>
>>>   That's what
>>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
>>> references to disappear.
>>
>>
>> Yes, didn't make sense to me why would we use vmap_local for internally
>> allocated buffers. I think we should also guard registers read/writes for the
>> same reason as above.
>>
>>
>>>
>>> The vmap_local is for mappings done by other drivers, through the dma-buf
>>> interface (where "other drivers" can include fbdev/fbcon, if you use the
>>> generic helpers).
>>> -Daniel
>>
>>
>> Ok, so I assumed that with vmap_local you were trying to solve the problem of 
>> quick reinsertion
>> of another device into same MMIO range that my driver still points too but 
>> actually are you trying to solve
>> the issue of exported dma buffers outliving the device ? For this we have 
>> drm_device refcount in the GEM layer
>> i think.
>>
>> Andrey
>>
>>
>>>
>>>> Andrey
>>>>
>>>>
>>>>> It doesn't
>>>>> solve all your problems, but it's a tool to get there.
>>>>> -Daniel
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> - handle fbcon somehow. I think shutting it all down should work out.
>>>>>>> - worst case keep the system backing storage around for shared dma-buf
>>>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>>>> pinning of importer buffers, might need to revisit that).
>>>>>>>
>>>>>>> Cheers, Daniel
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>>>>>> VRAM BOs
>>>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-17 20:38                                               ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17 20:38 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: Rob Herring, amd-gfx list, Daniel Vetter, dri-devel, Anholt,
	Eric, Pekka Paalanen, Qiang Yu, Greg KH, Alex Deucher, Wentland,
	Harry, Lucas Stach


On 12/17/20 3:10 PM, Christian König wrote:
> [SNIP]
>>>> By eliminating such users, and replacing them with local maps which
>>>>> are strictly bound in how long they can exist (and hence we can
>>>>> serialize against them finishing in our hotunplug code).
>>>> Not sure I see how serializing against BO map/unmap helps - our problem as
>>>> you described is that once
>>>> device is extracted and then something else quickly takes it's place in the
>>>> PCI topology
>>>> and gets assigned same physical IO ranges, then our driver will start 
>>>> accessing this
>>>> new device because our 'zombie' BOs are still pointing to those ranges.
>>> Until your driver's remove callback is finished the ranges stay reserved.
>>
>>
>> The ranges stay reserved until unmapped which happens in bo->destroy
>
> I'm not sure of that. Why do you think that?


Because of this sequence 
ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap
Is there another place I am missing ?


>
>> which for most internally allocated buffers is during sw_fini when last drm_put
>> is called.
>>
>>
>>> If that's not the case, then hotunplug would be fundamentally impossible
>>> ot handle correctly.
>>>
>>> Of course all the mmio actions will time out, so it might take some time
>>> to get through it all.
>>
>>
>> I found that PCI code provides pci_device_is_present function
>> we can use to avoid timeouts - it reads device vendor and checks if all 1s is 
>> returned
>> or not. We can call it from within register accessors before trying read/write
>
> That's way to much overhead! We need to keep that much lower or it will result 
> in quite a performance drop.
>
> I suggest to rather think about adding drm_dev_enter/exit guards.


Sure, this one is just a bit upstream to the disconnect event. Eventually none 
of them is watertight.

Andrey


>
> Christian.
>
>>
>>>> Another point regarding serializing - problem  is that some of those BOs are
>>>> very long lived, take for example the HW command
>>>> ring buffer Christian mentioned before -
>>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>>>> is basically for the entire time the device exists, it's destroyed only in
>>>> the SW fini stage (when last drm_dev
>>>> reference is dropped) and so should I grab it's dma_resv_lock from
>>>> amdgpu_pci_remove code and wait
>>>> for it to be unmapped before proceeding with the PCI remove code ? This can
>>>> take unbound time and that why I don't understand
>>>> how serializing will help.
>>> Uh you need to untangle that. After hw cleanup is done no one is allowed
>>> to touch that ringbuffer bo anymore from the kernel.
>>
>>
>> I would assume we are not allowed to touch it once we identified the device is
>> gone in order to minimize the chance of accidental writes to some other 
>> device which might now
>> occupy those IO ranges ?
>>
>>
>>>   That's what
>>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
>>> references to disappear.
>>
>>
>> Yes, didn't make sense to me why would we use vmap_local for internally
>> allocated buffers. I think we should also guard registers read/writes for the
>> same reason as above.
>>
>>
>>>
>>> The vmap_local is for mappings done by other drivers, through the dma-buf
>>> interface (where "other drivers" can include fbdev/fbcon, if you use the
>>> generic helpers).
>>> -Daniel
>>
>>
>> Ok, so I assumed that with vmap_local you were trying to solve the problem of 
>> quick reinsertion
>> of another device into same MMIO range that my driver still points too but 
>> actually are you trying to solve
>> the issue of exported dma buffers outliving the device ? For this we have 
>> drm_device refcount in the GEM layer
>> i think.
>>
>> Andrey
>>
>>
>>>
>>>> Andrey
>>>>
>>>>
>>>>> It doesn't
>>>>> solve all your problems, but it's a tool to get there.
>>>>> -Daniel
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> - handle fbcon somehow. I think shutting it all down should work out.
>>>>>>> - worst case keep the system backing storage around for shared dma-buf
>>>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>>>> pinning of importer buffers, might need to revisit that).
>>>>>>>
>>>>>>> Cheers, Daniel
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>>>>>> VRAM BOs
>>>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0 
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17 19:19                                           ` Andrey Grodzovsky
@ 2020-12-17 20:42                                             ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-17 20:42 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher,
	Christian König

On Thu, Dec 17, 2020 at 8:19 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
> On 12/17/20 7:01 AM, Daniel Vetter wrote:
> > On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote:
> >> On 12/16/20 6:15 PM, Daniel Vetter wrote:
> >>> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
> >>> <Andrey.Grodzovsky@amd.com> wrote:
> >>>> On 12/16/20 12:12 PM, Daniel Vetter wrote:
> >>>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König
> >>>>> <christian.koenig@amd.com> wrote:
> >>>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> >>>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
> >>>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> >>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> >>>>>>>>>> [SNIP]
> >>>>>>>>>>>> While we can't control user application accesses to the mapped
> >>>>>>>>>>>> buffers explicitly and hence we use page fault rerouting
> >>>>>>>>>>>> I am thinking that in this  case we may be able to sprinkle
> >>>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
> >>>>>>>>>>>> CPU access a DMA buffer from the kernel ?
> >>>>>>>>>>> Yes, I fear we are going to need that.
> >>>>>>>>>>>
> >>>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
> >>>>>>>>>>>> memcpy ? Is there other places ?
> >>>>>>>>>>> Puh, good question. I have no idea.
> >>>>>>>>>>>
> >>>>>>>>>>>> Another point is that at this point the driver shouldn't access any
> >>>>>>>>>>>> such buffers as we are at the process finishing the device.
> >>>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
> >>>>>>>>>>>> don't think there is anything else to do ?
> >>>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one
> >>>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
> >>>>>>>>>>>
> >>>>>>>>>>> Long story short we need to avoid any access to released pages after
> >>>>>>>>>>> unplug. No matter if it's from the kernel or userspace.
> >>>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
> >>>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
> >>>>>>>>>> the code
> >>>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> >>>>>>>>>> sake of device to main memory access). Kernel page table is not
> >>>>>>>>>> touched
> >>>>>>>>>> until last bo refcount is dropped and the bo is released
> >>>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> >>>>>>>>>> is both
> >>>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> >>>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
> >>>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
> >>>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
> >>>>>>>>>> for
> >>>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
> >>>>>>>>>> page table since iounmap wasn't called yet.
> >>>>>>>>> The problem is the system pages would be freed and if we kernel driver
> >>>>>>>>> still happily write to them we are pretty much busted because we write
> >>>>>>>>> to freed up memory.
> >>>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> >>>>>>> release
> >>>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> >>>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
> >>>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
> >>>>>>> we have end up writing to freed memory in this time interval ? Don't we
> >>>>>>> need to postpone pages freeing to after kernel page table unmapping ?
> >>>>>> BOs are only destroyed when there is a guarantee that nobody is
> >>>>>> accessing them any more.
> >>>>>>
> >>>>>> The problem here is that the pages as well as the VRAM can be
> >>>>>> immediately reused after the hotplug event.
> >>>>>>
> >>>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's
> >>>>>>>> going to be a different device behind the same mmio bar range most
> >>>>>>>> likely (the higher bridges all this have the same windows assigned),
> >>>>>>> No idea how this actually works but if we haven't called iounmap yet
> >>>>>>> doesn't it mean that those physical ranges that are still mapped into
> >>>>>>> page
> >>>>>>> table should be reserved and cannot be reused for another
> >>>>>>> device ? As a guess, maybe another subrange from the higher bridge's
> >>>>>>> total
> >>>>>>> range will be allocated.
> >>>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
> >>>>>> a range when it is hotplugged.
> >>>>>>
> >>>>>>>> and that's bad news if we keep using it for current drivers. So we
> >>>>>>>> really have to point all these cpu ptes to some other place.
> >>>>>>> We can't just unmap it without syncing against any in kernel accesses
> >>>>>>> to those buffers
> >>>>>>> and since page faulting technique we use for user mapped buffers seems
> >>>>>>> to not be possible
> >>>>>>> for kernel mapped buffers I am not sure how to do it gracefully...
> >>>>>> We could try to replace the kmap with a dummy page under the hood, but
> >>>>>> that is extremely tricky.
> >>>>>>
> >>>>>> Especially since BOs which are just 1 page in size could point to the
> >>>>>> linear mapping directly.
> >>>>> I think it's just more work. Essentially
> >>>>> - convert as much as possible of the kernel mappings to vmap_local,
> >>>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> >>>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a
> >>>>> dummy mapping.
> >>>> Read those patches. I am not sure how this helps with protecting
> >>>> against accesses to released backing pages or IO physical ranges of BO
> >>>> which is already mapped during the unplug event ?
> >>> By eliminating such users, and replacing them with local maps which
> >>> are strictly bound in how long they can exist (and hence we can
> >>> serialize against them finishing in our hotunplug code).
> >> Not sure I see how serializing against BO map/unmap helps -  our problem as
> >> you described is that once
> >> device is extracted and then something else quickly takes it's place in the
> >> PCI topology
> >> and gets assigned same physical IO ranges, then our driver will start accessing this
> >> new device because our 'zombie' BOs are still pointing to those ranges.
> > Until your driver's remove callback is finished the ranges stay reserved.
>
>
> The ranges stay reserved until unmapped which happens in bo->destroy
> which for most internally allocated  buffers is during sw_fini when last drm_put
> is called.
>
>
> > If that's not the case, then hotunplug would be fundamentally impossible
> > ot handle correctly.
> >
> > Of course all the mmio actions will time out, so it might take some time
> > to get through it all.
>
>
> I found that PCI code provides pci_device_is_present function
> we can use to avoid timeouts - it reads device vendor and checks if all 1s is
> returned
> or not. We can call it from within register accessors before trying read/write

drm_dev_enter/exit is a _lot_ less overhead, plus makes a lot stronger
guarantees for hotunplug ordering. Please use that one instead of
hand-rolling something which only mostly works for closing hotunplug
races. pciconfig access is really slow.

> >> Another point regarding serializing - problem  is that some of those BOs are
> >> very long lived, take for example the HW command
> >> ring buffer Christian mentioned before -
> >> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
> >> is basically for the entire time the device exists, it's destroyed only in
> >> the SW fini stage (when last drm_dev
> >> reference is dropped) and so should I grab it's dma_resv_lock from
> >> amdgpu_pci_remove code and wait
> >> for it to be unmapped before proceeding with the PCI remove code ? This can
> >> take unbound time and that why I don't understand
> >> how serializing will help.
> > Uh you need to untangle that. After hw cleanup is done no one is allowed
> > to touch that ringbuffer bo anymore from the kernel.
>
>
> I would assume we are not allowed to touch it once we identified the device is
> gone in order to minimize the chance of accidental writes to some other device
> which might now
> occupy those IO ranges ?
>
>
> >   That's what
> > drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
> > references to disappear.
>
>
> Yes, didn't make sense to me why would we use vmap_local for internally
> allocated buffers. I think we should also guard registers read/writes for the
> same reason as above.
>
>
> >
> > The vmap_local is for mappings done by other drivers, through the dma-buf
> > interface (where "other drivers" can include fbdev/fbcon, if you use the
> > generic helpers).
> > -Daniel
>
>
> Ok, so I assumed that with vmap_local you were trying to solve the problem of
> quick reinsertion
> of another device into same MMIO range that my driver still points too but
> actually are you trying to solve
> the issue of exported dma buffers outliving the device ? For this we have
> drm_device refcount in the GEM layer
> i think.

That's completely different lifetime problems. Don't mix them up :-)
One problem is the hardware disappearing, and for that we _have_ to
guarantee timeliness, or otherwise the pci subsystem gets pissed
(since like you say, a new device might show up and need it's mmio
bars assigned to io ranges). The other is lifetim of the software
objects we use as interfaces, both from userspace and from other
kernel drivers. There we fundamentally can't enforce timely cleanup,
and have to resort to refcounting.

We need both.
-Daniel

> Andrey
>
>
> >
> >> Andrey
> >>
> >>
> >>> It doesn't
> >>> solve all your problems, but it's a tool to get there.
> >>> -Daniel
> >>>
> >>>> Andrey
> >>>>
> >>>>
> >>>>> - handle fbcon somehow. I think shutting it all down should work out.
> >>>>> - worst case keep the system backing storage around for shared dma-buf
> >>>>> until the other non-dynamic driver releases it. for vram we require
> >>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
> >>>>> pinning of importer buffers, might need to revisit that).
> >>>>>
> >>>>> Cheers, Daniel
> >>>>>
> >>>>>> Christian.
> >>>>>>
> >>>>>>> Andrey
> >>>>>>>
> >>>>>>>
> >>>>>>>> -Daniel
> >>>>>>>>
> >>>>>>>>> Christian.
> >>>>>>>>>
> >>>>>>>>>> I loaded the driver with vm_update_mode=3
> >>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> >>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
> >>>>>>>>>> VRAM BOs
> >>>>>>>>>> and trying to read/write to them after device is removed.
> >>>>>>>>>>
> >>>>>>>>>> Andrey
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Christian.
> >>>>>>>>>>>
> >>>>>>>>>>>> Andrey
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> amd-gfx mailing list
> >>>>>>>>>> amd-gfx@lists.freedesktop.org
> >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0
> >>>>>>>>>>
> >>>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-17 20:42                                             ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-17 20:42 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Rob Herring, amd-gfx list, Greg KH, dri-devel, Anholt, Eric,
	Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland, Harry,
	Christian König, Lucas Stach

On Thu, Dec 17, 2020 at 8:19 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
> On 12/17/20 7:01 AM, Daniel Vetter wrote:
> > On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote:
> >> On 12/16/20 6:15 PM, Daniel Vetter wrote:
> >>> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
> >>> <Andrey.Grodzovsky@amd.com> wrote:
> >>>> On 12/16/20 12:12 PM, Daniel Vetter wrote:
> >>>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König
> >>>>> <christian.koenig@amd.com> wrote:
> >>>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
> >>>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
> >>>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
> >>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
> >>>>>>>>>> [SNIP]
> >>>>>>>>>>>> While we can't control user application accesses to the mapped
> >>>>>>>>>>>> buffers explicitly and hence we use page fault rerouting
> >>>>>>>>>>>> I am thinking that in this  case we may be able to sprinkle
> >>>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
> >>>>>>>>>>>> CPU access a DMA buffer from the kernel ?
> >>>>>>>>>>> Yes, I fear we are going to need that.
> >>>>>>>>>>>
> >>>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
> >>>>>>>>>>>> memcpy ? Is there other places ?
> >>>>>>>>>>> Puh, good question. I have no idea.
> >>>>>>>>>>>
> >>>>>>>>>>>> Another point is that at this point the driver shouldn't access any
> >>>>>>>>>>>> such buffers as we are at the process finishing the device.
> >>>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
> >>>>>>>>>>>> don't think there is anything else to do ?
> >>>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one
> >>>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
> >>>>>>>>>>>
> >>>>>>>>>>> Long story short we need to avoid any access to released pages after
> >>>>>>>>>>> unplug. No matter if it's from the kernel or userspace.
> >>>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
> >>>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
> >>>>>>>>>> the code
> >>>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
> >>>>>>>>>> sake of device to main memory access). Kernel page table is not
> >>>>>>>>>> touched
> >>>>>>>>>> until last bo refcount is dropped and the bo is released
> >>>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
> >>>>>>>>>> is both
> >>>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
> >>>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
> >>>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
> >>>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
> >>>>>>>>>> for
> >>>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
> >>>>>>>>>> page table since iounmap wasn't called yet.
> >>>>>>>>> The problem is the system pages would be freed and if we kernel driver
> >>>>>>>>> still happily write to them we are pretty much busted because we write
> >>>>>>>>> to freed up memory.
> >>>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
> >>>>>>> release
> >>>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
> >>>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
> >>>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
> >>>>>>> we have end up writing to freed memory in this time interval ? Don't we
> >>>>>>> need to postpone pages freeing to after kernel page table unmapping ?
> >>>>>> BOs are only destroyed when there is a guarantee that nobody is
> >>>>>> accessing them any more.
> >>>>>>
> >>>>>> The problem here is that the pages as well as the VRAM can be
> >>>>>> immediately reused after the hotplug event.
> >>>>>>
> >>>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's
> >>>>>>>> going to be a different device behind the same mmio bar range most
> >>>>>>>> likely (the higher bridges all this have the same windows assigned),
> >>>>>>> No idea how this actually works but if we haven't called iounmap yet
> >>>>>>> doesn't it mean that those physical ranges that are still mapped into
> >>>>>>> page
> >>>>>>> table should be reserved and cannot be reused for another
> >>>>>>> device ? As a guess, maybe another subrange from the higher bridge's
> >>>>>>> total
> >>>>>>> range will be allocated.
> >>>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
> >>>>>> a range when it is hotplugged.
> >>>>>>
> >>>>>>>> and that's bad news if we keep using it for current drivers. So we
> >>>>>>>> really have to point all these cpu ptes to some other place.
> >>>>>>> We can't just unmap it without syncing against any in kernel accesses
> >>>>>>> to those buffers
> >>>>>>> and since page faulting technique we use for user mapped buffers seems
> >>>>>>> to not be possible
> >>>>>>> for kernel mapped buffers I am not sure how to do it gracefully...
> >>>>>> We could try to replace the kmap with a dummy page under the hood, but
> >>>>>> that is extremely tricky.
> >>>>>>
> >>>>>> Especially since BOs which are just 1 page in size could point to the
> >>>>>> linear mapping directly.
> >>>>> I think it's just more work. Essentially
> >>>>> - convert as much as possible of the kernel mappings to vmap_local,
> >>>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
> >>>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a
> >>>>> dummy mapping.
> >>>> Read those patches. I am not sure how this helps with protecting
> >>>> against accesses to released backing pages or IO physical ranges of BO
> >>>> which is already mapped during the unplug event ?
> >>> By eliminating such users, and replacing them with local maps which
> >>> are strictly bound in how long they can exist (and hence we can
> >>> serialize against them finishing in our hotunplug code).
> >> Not sure I see how serializing against BO map/unmap helps -  our problem as
> >> you described is that once
> >> device is extracted and then something else quickly takes it's place in the
> >> PCI topology
> >> and gets assigned same physical IO ranges, then our driver will start accessing this
> >> new device because our 'zombie' BOs are still pointing to those ranges.
> > Until your driver's remove callback is finished the ranges stay reserved.
>
>
> The ranges stay reserved until unmapped which happens in bo->destroy
> which for most internally allocated  buffers is during sw_fini when last drm_put
> is called.
>
>
> > If that's not the case, then hotunplug would be fundamentally impossible
> > ot handle correctly.
> >
> > Of course all the mmio actions will time out, so it might take some time
> > to get through it all.
>
>
> I found that PCI code provides pci_device_is_present function
> we can use to avoid timeouts - it reads device vendor and checks if all 1s is
> returned
> or not. We can call it from within register accessors before trying read/write

drm_dev_enter/exit is a _lot_ less overhead, plus makes a lot stronger
guarantees for hotunplug ordering. Please use that one instead of
hand-rolling something which only mostly works for closing hotunplug
races. pciconfig access is really slow.

> >> Another point regarding serializing - problem  is that some of those BOs are
> >> very long lived, take for example the HW command
> >> ring buffer Christian mentioned before -
> >> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
> >> is basically for the entire time the device exists, it's destroyed only in
> >> the SW fini stage (when last drm_dev
> >> reference is dropped) and so should I grab it's dma_resv_lock from
> >> amdgpu_pci_remove code and wait
> >> for it to be unmapped before proceeding with the PCI remove code ? This can
> >> take unbound time and that why I don't understand
> >> how serializing will help.
> > Uh you need to untangle that. After hw cleanup is done no one is allowed
> > to touch that ringbuffer bo anymore from the kernel.
>
>
> I would assume we are not allowed to touch it once we identified the device is
> gone in order to minimize the chance of accidental writes to some other device
> which might now
> occupy those IO ranges ?
>
>
> >   That's what
> > drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
> > references to disappear.
>
>
> Yes, didn't make sense to me why would we use vmap_local for internally
> allocated buffers. I think we should also guard registers read/writes for the
> same reason as above.
>
>
> >
> > The vmap_local is for mappings done by other drivers, through the dma-buf
> > interface (where "other drivers" can include fbdev/fbcon, if you use the
> > generic helpers).
> > -Daniel
>
>
> Ok, so I assumed that with vmap_local you were trying to solve the problem of
> quick reinsertion
> of another device into same MMIO range that my driver still points too but
> actually are you trying to solve
> the issue of exported dma buffers outliving the device ? For this we have
> drm_device refcount in the GEM layer
> i think.

That's completely different lifetime problems. Don't mix them up :-)
One problem is the hardware disappearing, and for that we _have_ to
guarantee timeliness, or otherwise the pci subsystem gets pissed
(since like you say, a new device might show up and need it's mmio
bars assigned to io ranges). The other is lifetim of the software
objects we use as interfaces, both from userspace and from other
kernel drivers. There we fundamentally can't enforce timely cleanup,
and have to resort to refcounting.

We need both.
-Daniel

> Andrey
>
>
> >
> >> Andrey
> >>
> >>
> >>> It doesn't
> >>> solve all your problems, but it's a tool to get there.
> >>> -Daniel
> >>>
> >>>> Andrey
> >>>>
> >>>>
> >>>>> - handle fbcon somehow. I think shutting it all down should work out.
> >>>>> - worst case keep the system backing storage around for shared dma-buf
> >>>>> until the other non-dynamic driver releases it. for vram we require
> >>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
> >>>>> pinning of importer buffers, might need to revisit that).
> >>>>>
> >>>>> Cheers, Daniel
> >>>>>
> >>>>>> Christian.
> >>>>>>
> >>>>>>> Andrey
> >>>>>>>
> >>>>>>>
> >>>>>>>> -Daniel
> >>>>>>>>
> >>>>>>>>> Christian.
> >>>>>>>>>
> >>>>>>>>>> I loaded the driver with vm_update_mode=3
> >>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> >>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
> >>>>>>>>>> VRAM BOs
> >>>>>>>>>> and trying to read/write to them after device is removed.
> >>>>>>>>>>
> >>>>>>>>>> Andrey
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Christian.
> >>>>>>>>>>>
> >>>>>>>>>>>> Andrey
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> amd-gfx mailing list
> >>>>>>>>>> amd-gfx@lists.freedesktop.org
> >>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0
> >>>>>>>>>>
> >>>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17 20:38                                               ` Andrey Grodzovsky
@ 2020-12-17 20:48                                                 ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-17 20:48 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher,
	Christian König

On Thu, Dec 17, 2020 at 9:38 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
> On 12/17/20 3:10 PM, Christian König wrote:
> > [SNIP]
> >>>> By eliminating such users, and replacing them with local maps which
> >>>>> are strictly bound in how long they can exist (and hence we can
> >>>>> serialize against them finishing in our hotunplug code).
> >>>> Not sure I see how serializing against BO map/unmap helps - our problem as
> >>>> you described is that once
> >>>> device is extracted and then something else quickly takes it's place in the
> >>>> PCI topology
> >>>> and gets assigned same physical IO ranges, then our driver will start
> >>>> accessing this
> >>>> new device because our 'zombie' BOs are still pointing to those ranges.
> >>> Until your driver's remove callback is finished the ranges stay reserved.
> >>
> >>
> >> The ranges stay reserved until unmapped which happens in bo->destroy
> >
> > I'm not sure of that. Why do you think that?
>
>
> Because of this sequence
> ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap
> Is there another place I am missing ?

iounmap is just the mapping, it doesn't reserve anything in the resource tree.

And I don't think we should keep resources reserved past the pci
remove callback, because that would upset the pci subsystem trying to
assign resources to a newly hotplugged pci device.

Also from a quick check amdgpu does not reserve the pci bars it's
using. Somehow most drm drivers don't do that, not exactly sure why,
maybe auto-enumeration of resources just works too good and we don't
need the safety net of kernel/resource.c anymore.
-Daniel


> >
> >> which for most internally allocated buffers is during sw_fini when last drm_put
> >> is called.
> >>
> >>
> >>> If that's not the case, then hotunplug would be fundamentally impossible
> >>> ot handle correctly.
> >>>
> >>> Of course all the mmio actions will time out, so it might take some time
> >>> to get through it all.
> >>
> >>
> >> I found that PCI code provides pci_device_is_present function
> >> we can use to avoid timeouts - it reads device vendor and checks if all 1s is
> >> returned
> >> or not. We can call it from within register accessors before trying read/write
> >
> > That's way to much overhead! We need to keep that much lower or it will result
> > in quite a performance drop.
> >
> > I suggest to rather think about adding drm_dev_enter/exit guards.
>
>
> Sure, this one is just a bit upstream to the disconnect event. Eventually none
> of them is watertight.
>
> Andrey
>
>
> >
> > Christian.
> >
> >>
> >>>> Another point regarding serializing - problem  is that some of those BOs are
> >>>> very long lived, take for example the HW command
> >>>> ring buffer Christian mentioned before -
> >>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
> >>>> is basically for the entire time the device exists, it's destroyed only in
> >>>> the SW fini stage (when last drm_dev
> >>>> reference is dropped) and so should I grab it's dma_resv_lock from
> >>>> amdgpu_pci_remove code and wait
> >>>> for it to be unmapped before proceeding with the PCI remove code ? This can
> >>>> take unbound time and that why I don't understand
> >>>> how serializing will help.
> >>> Uh you need to untangle that. After hw cleanup is done no one is allowed
> >>> to touch that ringbuffer bo anymore from the kernel.
> >>
> >>
> >> I would assume we are not allowed to touch it once we identified the device is
> >> gone in order to minimize the chance of accidental writes to some other
> >> device which might now
> >> occupy those IO ranges ?
> >>
> >>
> >>>   That's what
> >>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
> >>> references to disappear.
> >>
> >>
> >> Yes, didn't make sense to me why would we use vmap_local for internally
> >> allocated buffers. I think we should also guard registers read/writes for the
> >> same reason as above.
> >>
> >>
> >>>
> >>> The vmap_local is for mappings done by other drivers, through the dma-buf
> >>> interface (where "other drivers" can include fbdev/fbcon, if you use the
> >>> generic helpers).
> >>> -Daniel
> >>
> >>
> >> Ok, so I assumed that with vmap_local you were trying to solve the problem of
> >> quick reinsertion
> >> of another device into same MMIO range that my driver still points too but
> >> actually are you trying to solve
> >> the issue of exported dma buffers outliving the device ? For this we have
> >> drm_device refcount in the GEM layer
> >> i think.
> >>
> >> Andrey
> >>
> >>
> >>>
> >>>> Andrey
> >>>>
> >>>>
> >>>>> It doesn't
> >>>>> solve all your problems, but it's a tool to get there.
> >>>>> -Daniel
> >>>>>
> >>>>>> Andrey
> >>>>>>
> >>>>>>
> >>>>>>> - handle fbcon somehow. I think shutting it all down should work out.
> >>>>>>> - worst case keep the system backing storage around for shared dma-buf
> >>>>>>> until the other non-dynamic driver releases it. for vram we require
> >>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
> >>>>>>> pinning of importer buffers, might need to revisit that).
> >>>>>>>
> >>>>>>> Cheers, Daniel
> >>>>>>>
> >>>>>>>> Christian.
> >>>>>>>>
> >>>>>>>>> Andrey
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -Daniel
> >>>>>>>>>>
> >>>>>>>>>>> Christian.
> >>>>>>>>>>>
> >>>>>>>>>>>> I loaded the driver with vm_update_mode=3
> >>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> >>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
> >>>>>>>>>>>> VRAM BOs
> >>>>>>>>>>>> and trying to read/write to them after device is removed.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Andrey
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Christian.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Andrey
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> amd-gfx mailing list
> >>>>>>>>>>>> amd-gfx@lists.freedesktop.org
> >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-17 20:48                                                 ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-17 20:48 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Rob Herring, amd-gfx list, Greg KH, dri-devel, Anholt, Eric,
	Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland, Harry,
	Christian König, Lucas Stach

On Thu, Dec 17, 2020 at 9:38 PM Andrey Grodzovsky
<Andrey.Grodzovsky@amd.com> wrote:
>
>
> On 12/17/20 3:10 PM, Christian König wrote:
> > [SNIP]
> >>>> By eliminating such users, and replacing them with local maps which
> >>>>> are strictly bound in how long they can exist (and hence we can
> >>>>> serialize against them finishing in our hotunplug code).
> >>>> Not sure I see how serializing against BO map/unmap helps - our problem as
> >>>> you described is that once
> >>>> device is extracted and then something else quickly takes it's place in the
> >>>> PCI topology
> >>>> and gets assigned same physical IO ranges, then our driver will start
> >>>> accessing this
> >>>> new device because our 'zombie' BOs are still pointing to those ranges.
> >>> Until your driver's remove callback is finished the ranges stay reserved.
> >>
> >>
> >> The ranges stay reserved until unmapped which happens in bo->destroy
> >
> > I'm not sure of that. Why do you think that?
>
>
> Because of this sequence
> ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap
> Is there another place I am missing ?

iounmap is just the mapping, it doesn't reserve anything in the resource tree.

And I don't think we should keep resources reserved past the pci
remove callback, because that would upset the pci subsystem trying to
assign resources to a newly hotplugged pci device.

Also from a quick check amdgpu does not reserve the pci bars it's
using. Somehow most drm drivers don't do that, not exactly sure why,
maybe auto-enumeration of resources just works too good and we don't
need the safety net of kernel/resource.c anymore.
-Daniel


> >
> >> which for most internally allocated buffers is during sw_fini when last drm_put
> >> is called.
> >>
> >>
> >>> If that's not the case, then hotunplug would be fundamentally impossible
> >>> ot handle correctly.
> >>>
> >>> Of course all the mmio actions will time out, so it might take some time
> >>> to get through it all.
> >>
> >>
> >> I found that PCI code provides pci_device_is_present function
> >> we can use to avoid timeouts - it reads device vendor and checks if all 1s is
> >> returned
> >> or not. We can call it from within register accessors before trying read/write
> >
> > That's way to much overhead! We need to keep that much lower or it will result
> > in quite a performance drop.
> >
> > I suggest to rather think about adding drm_dev_enter/exit guards.
>
>
> Sure, this one is just a bit upstream to the disconnect event. Eventually none
> of them is watertight.
>
> Andrey
>
>
> >
> > Christian.
> >
> >>
> >>>> Another point regarding serializing - problem  is that some of those BOs are
> >>>> very long lived, take for example the HW command
> >>>> ring buffer Christian mentioned before -
> >>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
> >>>> is basically for the entire time the device exists, it's destroyed only in
> >>>> the SW fini stage (when last drm_dev
> >>>> reference is dropped) and so should I grab it's dma_resv_lock from
> >>>> amdgpu_pci_remove code and wait
> >>>> for it to be unmapped before proceeding with the PCI remove code ? This can
> >>>> take unbound time and that why I don't understand
> >>>> how serializing will help.
> >>> Uh you need to untangle that. After hw cleanup is done no one is allowed
> >>> to touch that ringbuffer bo anymore from the kernel.
> >>
> >>
> >> I would assume we are not allowed to touch it once we identified the device is
> >> gone in order to minimize the chance of accidental writes to some other
> >> device which might now
> >> occupy those IO ranges ?
> >>
> >>
> >>>   That's what
> >>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
> >>> references to disappear.
> >>
> >>
> >> Yes, didn't make sense to me why would we use vmap_local for internally
> >> allocated buffers. I think we should also guard registers read/writes for the
> >> same reason as above.
> >>
> >>
> >>>
> >>> The vmap_local is for mappings done by other drivers, through the dma-buf
> >>> interface (where "other drivers" can include fbdev/fbcon, if you use the
> >>> generic helpers).
> >>> -Daniel
> >>
> >>
> >> Ok, so I assumed that with vmap_local you were trying to solve the problem of
> >> quick reinsertion
> >> of another device into same MMIO range that my driver still points too but
> >> actually are you trying to solve
> >> the issue of exported dma buffers outliving the device ? For this we have
> >> drm_device refcount in the GEM layer
> >> i think.
> >>
> >> Andrey
> >>
> >>
> >>>
> >>>> Andrey
> >>>>
> >>>>
> >>>>> It doesn't
> >>>>> solve all your problems, but it's a tool to get there.
> >>>>> -Daniel
> >>>>>
> >>>>>> Andrey
> >>>>>>
> >>>>>>
> >>>>>>> - handle fbcon somehow. I think shutting it all down should work out.
> >>>>>>> - worst case keep the system backing storage around for shared dma-buf
> >>>>>>> until the other non-dynamic driver releases it. for vram we require
> >>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
> >>>>>>> pinning of importer buffers, might need to revisit that).
> >>>>>>>
> >>>>>>> Cheers, Daniel
> >>>>>>>
> >>>>>>>> Christian.
> >>>>>>>>
> >>>>>>>>> Andrey
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> -Daniel
> >>>>>>>>>>
> >>>>>>>>>>> Christian.
> >>>>>>>>>>>
> >>>>>>>>>>>> I loaded the driver with vm_update_mode=3
> >>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
> >>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
> >>>>>>>>>>>> VRAM BOs
> >>>>>>>>>>>> and trying to read/write to them after device is removed.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Andrey
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>> Christian.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Andrey
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> amd-gfx mailing list
> >>>>>>>>>>>> amd-gfx@lists.freedesktop.org
> >>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C92654f053679415de74808d8a2838b3e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438033181843512%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2BeS7v5CrHRfblj2FFCd4nrDLxUxzam6EyHM6poPkGc4%3D&amp;reserved=0
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>
> >



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17 20:48                                                 ` Daniel Vetter
@ 2020-12-17 21:06                                                   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17 21:06 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher,
	Christian König


On 12/17/20 3:48 PM, Daniel Vetter wrote:
> On Thu, Dec 17, 2020 at 9:38 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> On 12/17/20 3:10 PM, Christian König wrote:
>>> [SNIP]
>>>>>> By eliminating such users, and replacing them with local maps which
>>>>>>> are strictly bound in how long they can exist (and hence we can
>>>>>>> serialize against them finishing in our hotunplug code).
>>>>>> Not sure I see how serializing against BO map/unmap helps - our problem as
>>>>>> you described is that once
>>>>>> device is extracted and then something else quickly takes it's place in the
>>>>>> PCI topology
>>>>>> and gets assigned same physical IO ranges, then our driver will start
>>>>>> accessing this
>>>>>> new device because our 'zombie' BOs are still pointing to those ranges.
>>>>> Until your driver's remove callback is finished the ranges stay reserved.
>>>>
>>>> The ranges stay reserved until unmapped which happens in bo->destroy
>>> I'm not sure of that. Why do you think that?
>>
>> Because of this sequence
>> ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap
>> Is there another place I am missing ?
> iounmap is just the mapping, it doesn't reserve anything in the resource tree.
>
> And I don't think we should keep resources reserved past the pci
> remove callback, because that would upset the pci subsystem trying to
> assign resources to a newly hotplugged pci device.


I assumed we are talking about VA ranges still mapped in the page table. I just 
assumed
that part of ioremap is also reservation of the mapped physical ranges. In fact, 
if we
do can explicitly reserve those ranges (as you mention here) then together with 
postponing
system memory pages freeing/releasing back to the page pool until after BO is 
unmapped
from the kernel address space I believe this could solve the issue of quick HW 
reinsertion
and make all the drm_dev_ener/exit guarding obsolete.

Andrey


> Also from a quick check amdgpu does not reserve the pci bars it's
> using. Somehow most drm drivers don't do that, not exactly sure why,
> maybe auto-enumeration of resources just works too good and we don't
> need the safety net of kernel/resource.c anymore.
> -Daniel
>
>
>>>> which for most internally allocated buffers is during sw_fini when last drm_put
>>>> is called.
>>>>
>>>>
>>>>> If that's not the case, then hotunplug would be fundamentally impossible
>>>>> ot handle correctly.
>>>>>
>>>>> Of course all the mmio actions will time out, so it might take some time
>>>>> to get through it all.
>>>>
>>>> I found that PCI code provides pci_device_is_present function
>>>> we can use to avoid timeouts - it reads device vendor and checks if all 1s is
>>>> returned
>>>> or not. We can call it from within register accessors before trying read/write
>>> That's way to much overhead! We need to keep that much lower or it will result
>>> in quite a performance drop.
>>>
>>> I suggest to rather think about adding drm_dev_enter/exit guards.
>>
>> Sure, this one is just a bit upstream to the disconnect event. Eventually none
>> of them is watertight.
>>
>> Andrey
>>
>>
>>> Christian.
>>>
>>>>>> Another point regarding serializing - problem  is that some of those BOs are
>>>>>> very long lived, take for example the HW command
>>>>>> ring buffer Christian mentioned before -
>>>>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>>>>>> is basically for the entire time the device exists, it's destroyed only in
>>>>>> the SW fini stage (when last drm_dev
>>>>>> reference is dropped) and so should I grab it's dma_resv_lock from
>>>>>> amdgpu_pci_remove code and wait
>>>>>> for it to be unmapped before proceeding with the PCI remove code ? This can
>>>>>> take unbound time and that why I don't understand
>>>>>> how serializing will help.
>>>>> Uh you need to untangle that. After hw cleanup is done no one is allowed
>>>>> to touch that ringbuffer bo anymore from the kernel.
>>>>
>>>> I would assume we are not allowed to touch it once we identified the device is
>>>> gone in order to minimize the chance of accidental writes to some other
>>>> device which might now
>>>> occupy those IO ranges ?
>>>>
>>>>
>>>>>    That's what
>>>>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
>>>>> references to disappear.
>>>>
>>>> Yes, didn't make sense to me why would we use vmap_local for internally
>>>> allocated buffers. I think we should also guard registers read/writes for the
>>>> same reason as above.
>>>>
>>>>
>>>>> The vmap_local is for mappings done by other drivers, through the dma-buf
>>>>> interface (where "other drivers" can include fbdev/fbcon, if you use the
>>>>> generic helpers).
>>>>> -Daniel
>>>>
>>>> Ok, so I assumed that with vmap_local you were trying to solve the problem of
>>>> quick reinsertion
>>>> of another device into same MMIO range that my driver still points too but
>>>> actually are you trying to solve
>>>> the issue of exported dma buffers outliving the device ? For this we have
>>>> drm_device refcount in the GEM layer
>>>> i think.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> It doesn't
>>>>>>> solve all your problems, but it's a tool to get there.
>>>>>>> -Daniel
>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>> - handle fbcon somehow. I think shutting it all down should work out.
>>>>>>>>> - worst case keep the system backing storage around for shared dma-buf
>>>>>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>>>>>> pinning of importer buffers, might need to revisit that).
>>>>>>>>>
>>>>>>>>> Cheers, Daniel
>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>>>>>>>> VRAM BOs
>>>>>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cc632e5bd5a1f402ac40608d8a2cd2072%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438349203619335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tKk0GTmSnkLVV42HuQaPAj01qFiwDW6Zs%2Bgi2hoq%2BvA%3D&amp;reserved=0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-17 21:06                                                   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17 21:06 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Herring, amd-gfx list, Greg KH, dri-devel, Anholt, Eric,
	Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland, Harry,
	Christian König, Lucas Stach


On 12/17/20 3:48 PM, Daniel Vetter wrote:
> On Thu, Dec 17, 2020 at 9:38 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> On 12/17/20 3:10 PM, Christian König wrote:
>>> [SNIP]
>>>>>> By eliminating such users, and replacing them with local maps which
>>>>>>> are strictly bound in how long they can exist (and hence we can
>>>>>>> serialize against them finishing in our hotunplug code).
>>>>>> Not sure I see how serializing against BO map/unmap helps - our problem as
>>>>>> you described is that once
>>>>>> device is extracted and then something else quickly takes it's place in the
>>>>>> PCI topology
>>>>>> and gets assigned same physical IO ranges, then our driver will start
>>>>>> accessing this
>>>>>> new device because our 'zombie' BOs are still pointing to those ranges.
>>>>> Until your driver's remove callback is finished the ranges stay reserved.
>>>>
>>>> The ranges stay reserved until unmapped which happens in bo->destroy
>>> I'm not sure of that. Why do you think that?
>>
>> Because of this sequence
>> ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap
>> Is there another place I am missing ?
> iounmap is just the mapping, it doesn't reserve anything in the resource tree.
>
> And I don't think we should keep resources reserved past the pci
> remove callback, because that would upset the pci subsystem trying to
> assign resources to a newly hotplugged pci device.


I assumed we are talking about VA ranges still mapped in the page table. I just 
assumed
that part of ioremap is also reservation of the mapped physical ranges. In fact, 
if we
do can explicitly reserve those ranges (as you mention here) then together with 
postponing
system memory pages freeing/releasing back to the page pool until after BO is 
unmapped
from the kernel address space I believe this could solve the issue of quick HW 
reinsertion
and make all the drm_dev_ener/exit guarding obsolete.

Andrey


> Also from a quick check amdgpu does not reserve the pci bars it's
> using. Somehow most drm drivers don't do that, not exactly sure why,
> maybe auto-enumeration of resources just works too good and we don't
> need the safety net of kernel/resource.c anymore.
> -Daniel
>
>
>>>> which for most internally allocated buffers is during sw_fini when last drm_put
>>>> is called.
>>>>
>>>>
>>>>> If that's not the case, then hotunplug would be fundamentally impossible
>>>>> ot handle correctly.
>>>>>
>>>>> Of course all the mmio actions will time out, so it might take some time
>>>>> to get through it all.
>>>>
>>>> I found that PCI code provides pci_device_is_present function
>>>> we can use to avoid timeouts - it reads device vendor and checks if all 1s is
>>>> returned
>>>> or not. We can call it from within register accessors before trying read/write
>>> That's way to much overhead! We need to keep that much lower or it will result
>>> in quite a performance drop.
>>>
>>> I suggest to rather think about adding drm_dev_enter/exit guards.
>>
>> Sure, this one is just a bit upstream to the disconnect event. Eventually none
>> of them is watertight.
>>
>> Andrey
>>
>>
>>> Christian.
>>>
>>>>>> Another point regarding serializing - problem  is that some of those BOs are
>>>>>> very long lived, take for example the HW command
>>>>>> ring buffer Christian mentioned before -
>>>>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>>>>>> is basically for the entire time the device exists, it's destroyed only in
>>>>>> the SW fini stage (when last drm_dev
>>>>>> reference is dropped) and so should I grab it's dma_resv_lock from
>>>>>> amdgpu_pci_remove code and wait
>>>>>> for it to be unmapped before proceeding with the PCI remove code ? This can
>>>>>> take unbound time and that why I don't understand
>>>>>> how serializing will help.
>>>>> Uh you need to untangle that. After hw cleanup is done no one is allowed
>>>>> to touch that ringbuffer bo anymore from the kernel.
>>>>
>>>> I would assume we are not allowed to touch it once we identified the device is
>>>> gone in order to minimize the chance of accidental writes to some other
>>>> device which might now
>>>> occupy those IO ranges ?
>>>>
>>>>
>>>>>    That's what
>>>>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
>>>>> references to disappear.
>>>>
>>>> Yes, didn't make sense to me why would we use vmap_local for internally
>>>> allocated buffers. I think we should also guard registers read/writes for the
>>>> same reason as above.
>>>>
>>>>
>>>>> The vmap_local is for mappings done by other drivers, through the dma-buf
>>>>> interface (where "other drivers" can include fbdev/fbcon, if you use the
>>>>> generic helpers).
>>>>> -Daniel
>>>>
>>>> Ok, so I assumed that with vmap_local you were trying to solve the problem of
>>>> quick reinsertion
>>>> of another device into same MMIO range that my driver still points too but
>>>> actually are you trying to solve
>>>> the issue of exported dma buffers outliving the device ? For this we have
>>>> drm_device refcount in the GEM layer
>>>> i think.
>>>>
>>>> Andrey
>>>>
>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> It doesn't
>>>>>>> solve all your problems, but it's a tool to get there.
>>>>>>> -Daniel
>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>>
>>>>>>>>> - handle fbcon somehow. I think shutting it all down should work out.
>>>>>>>>> - worst case keep the system backing storage around for shared dma-buf
>>>>>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>>>>>> pinning of importer buffers, might need to revisit that).
>>>>>>>>>
>>>>>>>>> Cheers, Daniel
>>>>>>>>>
>>>>>>>>>> Christian.
>>>>>>>>>>
>>>>>>>>>>> Andrey
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> -Daniel
>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>>>>>>>> VRAM BOs
>>>>>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cc632e5bd5a1f402ac40608d8a2cd2072%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438349203619335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tKk0GTmSnkLVV42HuQaPAj01qFiwDW6Zs%2Bgi2hoq%2BvA%3D&amp;reserved=0
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17 20:42                                             ` Daniel Vetter
@ 2020-12-17 21:13                                               ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17 21:13 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: amd-gfx list, Greg KH, dri-devel, Qiang Yu, Alex Deucher,
	Christian König


On 12/17/20 3:42 PM, Daniel Vetter wrote:
> On Thu, Dec 17, 2020 at 8:19 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> On 12/17/20 7:01 AM, Daniel Vetter wrote:
>>> On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote:
>>>> On 12/16/20 6:15 PM, Daniel Vetter wrote:
>>>>> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
>>>>> <Andrey.Grodzovsky@amd.com> wrote:
>>>>>> On 12/16/20 12:12 PM, Daniel Vetter wrote:
>>>>>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König
>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>>>>>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>>>>>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>>>>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>>>>>>>>>> [SNIP]
>>>>>>>>>>>>>> While we can't control user application accesses to the mapped
>>>>>>>>>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>>>>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>>>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>>>>>>>>> Yes, I fear we are going to need that.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>>>>>>>>>> memcpy ? Is there other places ?
>>>>>>>>>>>>> Puh, good question. I have no idea.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>>>>>>>>>> such buffers as we are at the process finishing the device.
>>>>>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>>>>>>>>>> don't think there is anything else to do ?
>>>>>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Long story short we need to avoid any access to released pages after
>>>>>>>>>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>>>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>>>>>>>>>> the code
>>>>>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>>>>>>>>>> sake of device to main memory access). Kernel page table is not
>>>>>>>>>>>> touched
>>>>>>>>>>>> until last bo refcount is dropped and the bo is released
>>>>>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>>>>>>>>>> is both
>>>>>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>>>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>>>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>>>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>>>>>>>>>> for
>>>>>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>>>>>>>>>> page table since iounmap wasn't called yet.
>>>>>>>>>>> The problem is the system pages would be freed and if we kernel driver
>>>>>>>>>>> still happily write to them we are pretty much busted because we write
>>>>>>>>>>> to freed up memory.
>>>>>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
>>>>>>>>> release
>>>>>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
>>>>>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
>>>>>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
>>>>>>>>> we have end up writing to freed memory in this time interval ? Don't we
>>>>>>>>> need to postpone pages freeing to after kernel page table unmapping ?
>>>>>>>> BOs are only destroyed when there is a guarantee that nobody is
>>>>>>>> accessing them any more.
>>>>>>>>
>>>>>>>> The problem here is that the pages as well as the VRAM can be
>>>>>>>> immediately reused after the hotplug event.
>>>>>>>>
>>>>>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's
>>>>>>>>>> going to be a different device behind the same mmio bar range most
>>>>>>>>>> likely (the higher bridges all this have the same windows assigned),
>>>>>>>>> No idea how this actually works but if we haven't called iounmap yet
>>>>>>>>> doesn't it mean that those physical ranges that are still mapped into
>>>>>>>>> page
>>>>>>>>> table should be reserved and cannot be reused for another
>>>>>>>>> device ? As a guess, maybe another subrange from the higher bridge's
>>>>>>>>> total
>>>>>>>>> range will be allocated.
>>>>>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
>>>>>>>> a range when it is hotplugged.
>>>>>>>>
>>>>>>>>>> and that's bad news if we keep using it for current drivers. So we
>>>>>>>>>> really have to point all these cpu ptes to some other place.
>>>>>>>>> We can't just unmap it without syncing against any in kernel accesses
>>>>>>>>> to those buffers
>>>>>>>>> and since page faulting technique we use for user mapped buffers seems
>>>>>>>>> to not be possible
>>>>>>>>> for kernel mapped buffers I am not sure how to do it gracefully...
>>>>>>>> We could try to replace the kmap with a dummy page under the hood, but
>>>>>>>> that is extremely tricky.
>>>>>>>>
>>>>>>>> Especially since BOs which are just 1 page in size could point to the
>>>>>>>> linear mapping directly.
>>>>>>> I think it's just more work. Essentially
>>>>>>> - convert as much as possible of the kernel mappings to vmap_local,
>>>>>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
>>>>>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a
>>>>>>> dummy mapping.
>>>>>> Read those patches. I am not sure how this helps with protecting
>>>>>> against accesses to released backing pages or IO physical ranges of BO
>>>>>> which is already mapped during the unplug event ?
>>>>> By eliminating such users, and replacing them with local maps which
>>>>> are strictly bound in how long they can exist (and hence we can
>>>>> serialize against them finishing in our hotunplug code).
>>>> Not sure I see how serializing against BO map/unmap helps -  our problem as
>>>> you described is that once
>>>> device is extracted and then something else quickly takes it's place in the
>>>> PCI topology
>>>> and gets assigned same physical IO ranges, then our driver will start accessing this
>>>> new device because our 'zombie' BOs are still pointing to those ranges.
>>> Until your driver's remove callback is finished the ranges stay reserved.
>>
>> The ranges stay reserved until unmapped which happens in bo->destroy
>> which for most internally allocated  buffers is during sw_fini when last drm_put
>> is called.
>>
>>
>>> If that's not the case, then hotunplug would be fundamentally impossible
>>> ot handle correctly.
>>>
>>> Of course all the mmio actions will time out, so it might take some time
>>> to get through it all.
>>
>> I found that PCI code provides pci_device_is_present function
>> we can use to avoid timeouts - it reads device vendor and checks if all 1s is
>> returned
>> or not. We can call it from within register accessors before trying read/write
> drm_dev_enter/exit is a _lot_ less overhead, plus makes a lot stronger
> guarantees for hotunplug ordering. Please use that one instead of
> hand-rolling something which only mostly works for closing hotunplug
> races. pciconfig access is really slow.
>
>>>> Another point regarding serializing - problem  is that some of those BOs are
>>>> very long lived, take for example the HW command
>>>> ring buffer Christian mentioned before -
>>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>>>> is basically for the entire time the device exists, it's destroyed only in
>>>> the SW fini stage (when last drm_dev
>>>> reference is dropped) and so should I grab it's dma_resv_lock from
>>>> amdgpu_pci_remove code and wait
>>>> for it to be unmapped before proceeding with the PCI remove code ? This can
>>>> take unbound time and that why I don't understand
>>>> how serializing will help.
>>> Uh you need to untangle that. After hw cleanup is done no one is allowed
>>> to touch that ringbuffer bo anymore from the kernel.
>>
>> I would assume we are not allowed to touch it once we identified the device is
>> gone in order to minimize the chance of accidental writes to some other device
>> which might now
>> occupy those IO ranges ?
>>
>>
>>>    That's what
>>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
>>> references to disappear.
>>
>> Yes, didn't make sense to me why would we use vmap_local for internally
>> allocated buffers. I think we should also guard registers read/writes for the
>> same reason as above.
>>
>>
>>> The vmap_local is for mappings done by other drivers, through the dma-buf
>>> interface (where "other drivers" can include fbdev/fbcon, if you use the
>>> generic helpers).
>>> -Daniel
>>
>> Ok, so I assumed that with vmap_local you were trying to solve the problem of
>> quick reinsertion
>> of another device into same MMIO range that my driver still points too but
>> actually are you trying to solve
>> the issue of exported dma buffers outliving the device ? For this we have
>> drm_device refcount in the GEM layer
>> i think.
> That's completely different lifetime problems. Don't mix them up :-)
> One problem is the hardware disappearing, and for that we _have_ to
> guarantee timeliness, or otherwise the pci subsystem gets pissed
> (since like you say, a new device might show up and need it's mmio
> bars assigned to io ranges). The other is lifetim of the software
> objects we use as interfaces, both from userspace and from other
> kernel drivers. There we fundamentally can't enforce timely cleanup,
> and have to resort to refcounting.


So regarding the second issue, as I mentioned above, don't we already use 
drm_dev_get/put
for exported BOs ? Earlier in this discussion you mentioned that we are ok for 
dma buffers since
we already have the refcounting at the GEM layer and the real life cycle problem 
we have is the dma_fences
for which there is no drm_dev refcounting. Seems to me then that vmap_local is 
superfluous because
of the recounting we already have for exported dma_bufs and for dma_fences it 
won't help.

Andrey


>
> We need both.
> -Daniel
>
>> Andrey
>>
>>
>>>> Andrey
>>>>
>>>>
>>>>> It doesn't
>>>>> solve all your problems, but it's a tool to get there.
>>>>> -Daniel
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> - handle fbcon somehow. I think shutting it all down should work out.
>>>>>>> - worst case keep the system backing storage around for shared dma-buf
>>>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>>>> pinning of importer buffers, might need to revisit that).
>>>>>>>
>>>>>>> Cheers, Daniel
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>>>>>> VRAM BOs
>>>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ca9b7d1c15b404ba8f71508d8a2cc465d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438345528118947%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=QZAPRM%2BEc5b6uRCe9ws5AzLpL7VdyzACZG%2Blp8rO738%3D&amp;reserved=0
>>>>>>>>>>>>
>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-17 21:13                                               ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2020-12-17 21:13 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Herring, amd-gfx list, Greg KH, dri-devel, Anholt, Eric,
	Pekka Paalanen, Qiang Yu, Alex Deucher, Wentland, Harry,
	Christian König, Lucas Stach


On 12/17/20 3:42 PM, Daniel Vetter wrote:
> On Thu, Dec 17, 2020 at 8:19 PM Andrey Grodzovsky
> <Andrey.Grodzovsky@amd.com> wrote:
>>
>> On 12/17/20 7:01 AM, Daniel Vetter wrote:
>>> On Wed, Dec 16, 2020 at 07:20:02PM -0500, Andrey Grodzovsky wrote:
>>>> On 12/16/20 6:15 PM, Daniel Vetter wrote:
>>>>> On Wed, Dec 16, 2020 at 7:26 PM Andrey Grodzovsky
>>>>> <Andrey.Grodzovsky@amd.com> wrote:
>>>>>> On 12/16/20 12:12 PM, Daniel Vetter wrote:
>>>>>>> On Wed, Dec 16, 2020 at 5:18 PM Christian König
>>>>>>> <christian.koenig@amd.com> wrote:
>>>>>>>> Am 16.12.20 um 17:13 schrieb Andrey Grodzovsky:
>>>>>>>>> On 12/16/20 9:21 AM, Daniel Vetter wrote:
>>>>>>>>>> On Wed, Dec 16, 2020 at 9:04 AM Christian König
>>>>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>>>>>>> Am 15.12.20 um 21:18 schrieb Andrey Grodzovsky:
>>>>>>>>>>>> [SNIP]
>>>>>>>>>>>>>> While we can't control user application accesses to the mapped
>>>>>>>>>>>>>> buffers explicitly and hence we use page fault rerouting
>>>>>>>>>>>>>> I am thinking that in this  case we may be able to sprinkle
>>>>>>>>>>>>>> drm_dev_enter/exit in any such sensitive place were we might
>>>>>>>>>>>>>> CPU access a DMA buffer from the kernel ?
>>>>>>>>>>>>> Yes, I fear we are going to need that.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Things like CPU page table updates, ring buffer accesses and FW
>>>>>>>>>>>>>> memcpy ? Is there other places ?
>>>>>>>>>>>>> Puh, good question. I have no idea.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Another point is that at this point the driver shouldn't access any
>>>>>>>>>>>>>> such buffers as we are at the process finishing the device.
>>>>>>>>>>>>>> AFAIK there is no page fault mechanism for kernel mappings so I
>>>>>>>>>>>>>> don't think there is anything else to do ?
>>>>>>>>>>>>> Well there is a page fault handler for kernel mappings, but that one
>>>>>>>>>>>>> just prints the stack trace into the system log and calls BUG(); :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Long story short we need to avoid any access to released pages after
>>>>>>>>>>>>> unplug. No matter if it's from the kernel or userspace.
>>>>>>>>>>>> I was just about to start guarding with drm_dev_enter/exit CPU
>>>>>>>>>>>> accesses from kernel to GTT ot VRAM buffers but then i looked more in
>>>>>>>>>>>> the code
>>>>>>>>>>>> and seems like ttm_tt_unpopulate just deletes DMA mappings (for the
>>>>>>>>>>>> sake of device to main memory access). Kernel page table is not
>>>>>>>>>>>> touched
>>>>>>>>>>>> until last bo refcount is dropped and the bo is released
>>>>>>>>>>>> (ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap). This
>>>>>>>>>>>> is both
>>>>>>>>>>>> for GTT BOs maped to kernel by kmap (or vmap) and for VRAM BOs mapped
>>>>>>>>>>>> by ioremap. So as i see it, nothing will bad will happen after we
>>>>>>>>>>>> unpopulate a BO while we still try to use a kernel mapping for it,
>>>>>>>>>>>> system memory pages backing GTT BOs are still mapped and not freed and
>>>>>>>>>>>> for
>>>>>>>>>>>> VRAM BOs same is for the IO physical ranges mapped into the kernel
>>>>>>>>>>>> page table since iounmap wasn't called yet.
>>>>>>>>>>> The problem is the system pages would be freed and if we kernel driver
>>>>>>>>>>> still happily write to them we are pretty much busted because we write
>>>>>>>>>>> to freed up memory.
>>>>>>>>> OK, i see i missed ttm_tt_unpopulate->..->ttm_pool_free which will
>>>>>>>>> release
>>>>>>>>> the GTT BO pages. But then isn't there a problem in ttm_bo_release since
>>>>>>>>> ttm_bo_cleanup_memtype_use which also leads to pages release comes
>>>>>>>>> before bo->destroy which unmaps the pages from kernel page table ? Won't
>>>>>>>>> we have end up writing to freed memory in this time interval ? Don't we
>>>>>>>>> need to postpone pages freeing to after kernel page table unmapping ?
>>>>>>>> BOs are only destroyed when there is a guarantee that nobody is
>>>>>>>> accessing them any more.
>>>>>>>>
>>>>>>>> The problem here is that the pages as well as the VRAM can be
>>>>>>>> immediately reused after the hotplug event.
>>>>>>>>
>>>>>>>>>> Similar for vram, if this is actual hotunplug and then replug, there's
>>>>>>>>>> going to be a different device behind the same mmio bar range most
>>>>>>>>>> likely (the higher bridges all this have the same windows assigned),
>>>>>>>>> No idea how this actually works but if we haven't called iounmap yet
>>>>>>>>> doesn't it mean that those physical ranges that are still mapped into
>>>>>>>>> page
>>>>>>>>> table should be reserved and cannot be reused for another
>>>>>>>>> device ? As a guess, maybe another subrange from the higher bridge's
>>>>>>>>> total
>>>>>>>>> range will be allocated.
>>>>>>>> Nope, the PCIe subsystem doesn't care about any ioremap still active for
>>>>>>>> a range when it is hotplugged.
>>>>>>>>
>>>>>>>>>> and that's bad news if we keep using it for current drivers. So we
>>>>>>>>>> really have to point all these cpu ptes to some other place.
>>>>>>>>> We can't just unmap it without syncing against any in kernel accesses
>>>>>>>>> to those buffers
>>>>>>>>> and since page faulting technique we use for user mapped buffers seems
>>>>>>>>> to not be possible
>>>>>>>>> for kernel mapped buffers I am not sure how to do it gracefully...
>>>>>>>> We could try to replace the kmap with a dummy page under the hood, but
>>>>>>>> that is extremely tricky.
>>>>>>>>
>>>>>>>> Especially since BOs which are just 1 page in size could point to the
>>>>>>>> linear mapping directly.
>>>>>>> I think it's just more work. Essentially
>>>>>>> - convert as much as possible of the kernel mappings to vmap_local,
>>>>>>> which Thomas Zimmermann is rolling out. That way a dma_resv_lock will
>>>>>>> serve as a barrier, and ofc any new vmap needs to fail or hand out a
>>>>>>> dummy mapping.
>>>>>> Read those patches. I am not sure how this helps with protecting
>>>>>> against accesses to released backing pages or IO physical ranges of BO
>>>>>> which is already mapped during the unplug event ?
>>>>> By eliminating such users, and replacing them with local maps which
>>>>> are strictly bound in how long they can exist (and hence we can
>>>>> serialize against them finishing in our hotunplug code).
>>>> Not sure I see how serializing against BO map/unmap helps -  our problem as
>>>> you described is that once
>>>> device is extracted and then something else quickly takes it's place in the
>>>> PCI topology
>>>> and gets assigned same physical IO ranges, then our driver will start accessing this
>>>> new device because our 'zombie' BOs are still pointing to those ranges.
>>> Until your driver's remove callback is finished the ranges stay reserved.
>>
>> The ranges stay reserved until unmapped which happens in bo->destroy
>> which for most internally allocated  buffers is during sw_fini when last drm_put
>> is called.
>>
>>
>>> If that's not the case, then hotunplug would be fundamentally impossible
>>> ot handle correctly.
>>>
>>> Of course all the mmio actions will time out, so it might take some time
>>> to get through it all.
>>
>> I found that PCI code provides pci_device_is_present function
>> we can use to avoid timeouts - it reads device vendor and checks if all 1s is
>> returned
>> or not. We can call it from within register accessors before trying read/write
> drm_dev_enter/exit is a _lot_ less overhead, plus makes a lot stronger
> guarantees for hotunplug ordering. Please use that one instead of
> hand-rolling something which only mostly works for closing hotunplug
> races. pciconfig access is really slow.
>
>>>> Another point regarding serializing - problem  is that some of those BOs are
>>>> very long lived, take for example the HW command
>>>> ring buffer Christian mentioned before -
>>>> (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
>>>> is basically for the entire time the device exists, it's destroyed only in
>>>> the SW fini stage (when last drm_dev
>>>> reference is dropped) and so should I grab it's dma_resv_lock from
>>>> amdgpu_pci_remove code and wait
>>>> for it to be unmapped before proceeding with the PCI remove code ? This can
>>>> take unbound time and that why I don't understand
>>>> how serializing will help.
>>> Uh you need to untangle that. After hw cleanup is done no one is allowed
>>> to touch that ringbuffer bo anymore from the kernel.
>>
>> I would assume we are not allowed to touch it once we identified the device is
>> gone in order to minimize the chance of accidental writes to some other device
>> which might now
>> occupy those IO ranges ?
>>
>>
>>>    That's what
>>> drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
>>> references to disappear.
>>
>> Yes, didn't make sense to me why would we use vmap_local for internally
>> allocated buffers. I think we should also guard registers read/writes for the
>> same reason as above.
>>
>>
>>> The vmap_local is for mappings done by other drivers, through the dma-buf
>>> interface (where "other drivers" can include fbdev/fbcon, if you use the
>>> generic helpers).
>>> -Daniel
>>
>> Ok, so I assumed that with vmap_local you were trying to solve the problem of
>> quick reinsertion
>> of another device into same MMIO range that my driver still points too but
>> actually are you trying to solve
>> the issue of exported dma buffers outliving the device ? For this we have
>> drm_device refcount in the GEM layer
>> i think.
> That's completely different lifetime problems. Don't mix them up :-)
> One problem is the hardware disappearing, and for that we _have_ to
> guarantee timeliness, or otherwise the pci subsystem gets pissed
> (since like you say, a new device might show up and need it's mmio
> bars assigned to io ranges). The other is lifetim of the software
> objects we use as interfaces, both from userspace and from other
> kernel drivers. There we fundamentally can't enforce timely cleanup,
> and have to resort to refcounting.


So regarding the second issue, as I mentioned above, don't we already use 
drm_dev_get/put
for exported BOs ? Earlier in this discussion you mentioned that we are ok for 
dma buffers since
we already have the refcounting at the GEM layer and the real life cycle problem 
we have is the dma_fences
for which there is no drm_dev refcounting. Seems to me then that vmap_local is 
superfluous because
of the recounting we already have for exported dma_bufs and for dma_fences it 
won't help.

Andrey


>
> We need both.
> -Daniel
>
>> Andrey
>>
>>
>>>> Andrey
>>>>
>>>>
>>>>> It doesn't
>>>>> solve all your problems, but it's a tool to get there.
>>>>> -Daniel
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> - handle fbcon somehow. I think shutting it all down should work out.
>>>>>>> - worst case keep the system backing storage around for shared dma-buf
>>>>>>> until the other non-dynamic driver releases it. for vram we require
>>>>>>> dynamic importers (and maybe it wasn't such a bright idea to allow
>>>>>>> pinning of importer buffers, might need to revisit that).
>>>>>>>
>>>>>>> Cheers, Daniel
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Andrey
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> I loaded the driver with vm_update_mode=3
>>>>>>>>>>>> meaning all VM updates done using CPU and hasn't seen any OOPs after
>>>>>>>>>>>> removing the device. I guess i can test it more by allocating GTT and
>>>>>>>>>>>> VRAM BOs
>>>>>>>>>>>> and trying to read/write to them after device is removed.
>>>>>>>>>>>>
>>>>>>>>>>>> Andrey
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Andrey
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ca9b7d1c15b404ba8f71508d8a2cc465d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438345528118947%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=QZAPRM%2BEc5b6uRCe9ws5AzLpL7VdyzACZG%2Blp8rO738%3D&amp;reserved=0
>>>>>>>>>>>>
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17 21:06                                                   ` Andrey Grodzovsky
@ 2020-12-18 14:30                                                     ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-18 14:30 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Greg KH, dri-devel, amd-gfx list, Alex Deucher, Qiang Yu,
	Christian König

On Thu, Dec 17, 2020 at 04:06:38PM -0500, Andrey Grodzovsky wrote:
> 
> On 12/17/20 3:48 PM, Daniel Vetter wrote:
> > On Thu, Dec 17, 2020 at 9:38 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky@amd.com> wrote:
> > > 
> > > On 12/17/20 3:10 PM, Christian König wrote:
> > > > [SNIP]
> > > > > > > By eliminating such users, and replacing them with local maps which
> > > > > > > > are strictly bound in how long they can exist (and hence we can
> > > > > > > > serialize against them finishing in our hotunplug code).
> > > > > > > Not sure I see how serializing against BO map/unmap helps - our problem as
> > > > > > > you described is that once
> > > > > > > device is extracted and then something else quickly takes it's place in the
> > > > > > > PCI topology
> > > > > > > and gets assigned same physical IO ranges, then our driver will start
> > > > > > > accessing this
> > > > > > > new device because our 'zombie' BOs are still pointing to those ranges.
> > > > > > Until your driver's remove callback is finished the ranges stay reserved.
> > > > > 
> > > > > The ranges stay reserved until unmapped which happens in bo->destroy
> > > > I'm not sure of that. Why do you think that?
> > > 
> > > Because of this sequence
> > > ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap
> > > Is there another place I am missing ?
> > iounmap is just the mapping, it doesn't reserve anything in the resource tree.
> > 
> > And I don't think we should keep resources reserved past the pci
> > remove callback, because that would upset the pci subsystem trying to
> > assign resources to a newly hotplugged pci device.
> 
> 
> I assumed we are talking about VA ranges still mapped in the page table. I
> just assumed
> that part of ioremap is also reservation of the mapped physical ranges. In
> fact, if we
> do can explicitly reserve those ranges (as you mention here) then together
> with postponing
> system memory pages freeing/releasing back to the page pool until after BO
> is unmapped
> from the kernel address space I believe this could solve the issue of quick
> HW reinsertion
> and make all the drm_dev_ener/exit guarding obsolete.

We can't reserve these ranges, that's what I tried to explaine:
- kernel/resource.c isn't very consistently used
- the pci core will get pissed if there's suddenly a range in the middle
  of a bridge that it can't use
- nesting is allowed for resources, so this doesn't actually garuantee
  much

I just wanted to point out that ioremap does do any reserving, so not
enough by far.

We really have to stop using any mmio ranges before the pci remove
callback is finished.
-Daniel

> 
> Andrey
> 
> 
> > Also from a quick check amdgpu does not reserve the pci bars it's
> > using. Somehow most drm drivers don't do that, not exactly sure why,
> > maybe auto-enumeration of resources just works too good and we don't
> > need the safety net of kernel/resource.c anymore.
> > -Daniel
> > 
> > 
> > > > > which for most internally allocated buffers is during sw_fini when last drm_put
> > > > > is called.
> > > > > 
> > > > > 
> > > > > > If that's not the case, then hotunplug would be fundamentally impossible
> > > > > > ot handle correctly.
> > > > > > 
> > > > > > Of course all the mmio actions will time out, so it might take some time
> > > > > > to get through it all.
> > > > > 
> > > > > I found that PCI code provides pci_device_is_present function
> > > > > we can use to avoid timeouts - it reads device vendor and checks if all 1s is
> > > > > returned
> > > > > or not. We can call it from within register accessors before trying read/write
> > > > That's way to much overhead! We need to keep that much lower or it will result
> > > > in quite a performance drop.
> > > > 
> > > > I suggest to rather think about adding drm_dev_enter/exit guards.
> > > 
> > > Sure, this one is just a bit upstream to the disconnect event. Eventually none
> > > of them is watertight.
> > > 
> > > Andrey
> > > 
> > > 
> > > > Christian.
> > > > 
> > > > > > > Another point regarding serializing - problem  is that some of those BOs are
> > > > > > > very long lived, take for example the HW command
> > > > > > > ring buffer Christian mentioned before -
> > > > > > > (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
> > > > > > > is basically for the entire time the device exists, it's destroyed only in
> > > > > > > the SW fini stage (when last drm_dev
> > > > > > > reference is dropped) and so should I grab it's dma_resv_lock from
> > > > > > > amdgpu_pci_remove code and wait
> > > > > > > for it to be unmapped before proceeding with the PCI remove code ? This can
> > > > > > > take unbound time and that why I don't understand
> > > > > > > how serializing will help.
> > > > > > Uh you need to untangle that. After hw cleanup is done no one is allowed
> > > > > > to touch that ringbuffer bo anymore from the kernel.
> > > > > 
> > > > > I would assume we are not allowed to touch it once we identified the device is
> > > > > gone in order to minimize the chance of accidental writes to some other
> > > > > device which might now
> > > > > occupy those IO ranges ?
> > > > > 
> > > > > 
> > > > > >    That's what
> > > > > > drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
> > > > > > references to disappear.
> > > > > 
> > > > > Yes, didn't make sense to me why would we use vmap_local for internally
> > > > > allocated buffers. I think we should also guard registers read/writes for the
> > > > > same reason as above.
> > > > > 
> > > > > 
> > > > > > The vmap_local is for mappings done by other drivers, through the dma-buf
> > > > > > interface (where "other drivers" can include fbdev/fbcon, if you use the
> > > > > > generic helpers).
> > > > > > -Daniel
> > > > > 
> > > > > Ok, so I assumed that with vmap_local you were trying to solve the problem of
> > > > > quick reinsertion
> > > > > of another device into same MMIO range that my driver still points too but
> > > > > actually are you trying to solve
> > > > > the issue of exported dma buffers outliving the device ? For this we have
> > > > > drm_device refcount in the GEM layer
> > > > > i think.
> > > > > 
> > > > > Andrey
> > > > > 
> > > > > 
> > > > > > > Andrey
> > > > > > > 
> > > > > > > 
> > > > > > > > It doesn't
> > > > > > > > solve all your problems, but it's a tool to get there.
> > > > > > > > -Daniel
> > > > > > > > 
> > > > > > > > > Andrey
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > - handle fbcon somehow. I think shutting it all down should work out.
> > > > > > > > > > - worst case keep the system backing storage around for shared dma-buf
> > > > > > > > > > until the other non-dynamic driver releases it. for vram we require
> > > > > > > > > > dynamic importers (and maybe it wasn't such a bright idea to allow
> > > > > > > > > > pinning of importer buffers, might need to revisit that).
> > > > > > > > > > 
> > > > > > > > > > Cheers, Daniel
> > > > > > > > > > 
> > > > > > > > > > > Christian.
> > > > > > > > > > > 
> > > > > > > > > > > > Andrey
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > > -Daniel
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > Christian.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I loaded the driver with vm_update_mode=3
> > > > > > > > > > > > > > > meaning all VM updates done using CPU and hasn't seen any OOPs after
> > > > > > > > > > > > > > > removing the device. I guess i can test it more by allocating GTT and
> > > > > > > > > > > > > > > VRAM BOs
> > > > > > > > > > > > > > > and trying to read/write to them after device is removed.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Andrey
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > Christian.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Andrey
> > > > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > > > amd-gfx mailing list
> > > > > > > > > > > > > > > amd-gfx@lists.freedesktop.org
> > > > > > > > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cc632e5bd5a1f402ac40608d8a2cd2072%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438349203619335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tKk0GTmSnkLVV42HuQaPAj01qFiwDW6Zs%2Bgi2hoq%2BvA%3D&amp;reserved=0
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > 
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2020-12-18 14:30                                                     ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2020-12-18 14:30 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: Rob Herring, Greg KH, dri-devel, Anholt, Eric, Pekka Paalanen,
	amd-gfx list, Daniel Vetter, Alex Deucher, Qiang Yu, Wentland,
	Harry, Christian König, Lucas Stach

On Thu, Dec 17, 2020 at 04:06:38PM -0500, Andrey Grodzovsky wrote:
> 
> On 12/17/20 3:48 PM, Daniel Vetter wrote:
> > On Thu, Dec 17, 2020 at 9:38 PM Andrey Grodzovsky
> > <Andrey.Grodzovsky@amd.com> wrote:
> > > 
> > > On 12/17/20 3:10 PM, Christian König wrote:
> > > > [SNIP]
> > > > > > > By eliminating such users, and replacing them with local maps which
> > > > > > > > are strictly bound in how long they can exist (and hence we can
> > > > > > > > serialize against them finishing in our hotunplug code).
> > > > > > > Not sure I see how serializing against BO map/unmap helps - our problem as
> > > > > > > you described is that once
> > > > > > > device is extracted and then something else quickly takes it's place in the
> > > > > > > PCI topology
> > > > > > > and gets assigned same physical IO ranges, then our driver will start
> > > > > > > accessing this
> > > > > > > new device because our 'zombie' BOs are still pointing to those ranges.
> > > > > > Until your driver's remove callback is finished the ranges stay reserved.
> > > > > 
> > > > > The ranges stay reserved until unmapped which happens in bo->destroy
> > > > I'm not sure of that. Why do you think that?
> > > 
> > > Because of this sequence
> > > ttm_bo_release->destroy->amdgpu_bo_destroy->amdgpu_bo_kunmap->...->iounmap
> > > Is there another place I am missing ?
> > iounmap is just the mapping, it doesn't reserve anything in the resource tree.
> > 
> > And I don't think we should keep resources reserved past the pci
> > remove callback, because that would upset the pci subsystem trying to
> > assign resources to a newly hotplugged pci device.
> 
> 
> I assumed we are talking about VA ranges still mapped in the page table. I
> just assumed
> that part of ioremap is also reservation of the mapped physical ranges. In
> fact, if we
> do can explicitly reserve those ranges (as you mention here) then together
> with postponing
> system memory pages freeing/releasing back to the page pool until after BO
> is unmapped
> from the kernel address space I believe this could solve the issue of quick
> HW reinsertion
> and make all the drm_dev_ener/exit guarding obsolete.

We can't reserve these ranges, that's what I tried to explaine:
- kernel/resource.c isn't very consistently used
- the pci core will get pissed if there's suddenly a range in the middle
  of a bridge that it can't use
- nesting is allowed for resources, so this doesn't actually garuantee
  much

I just wanted to point out that ioremap does do any reserving, so not
enough by far.

We really have to stop using any mmio ranges before the pci remove
callback is finished.
-Daniel

> 
> Andrey
> 
> 
> > Also from a quick check amdgpu does not reserve the pci bars it's
> > using. Somehow most drm drivers don't do that, not exactly sure why,
> > maybe auto-enumeration of resources just works too good and we don't
> > need the safety net of kernel/resource.c anymore.
> > -Daniel
> > 
> > 
> > > > > which for most internally allocated buffers is during sw_fini when last drm_put
> > > > > is called.
> > > > > 
> > > > > 
> > > > > > If that's not the case, then hotunplug would be fundamentally impossible
> > > > > > ot handle correctly.
> > > > > > 
> > > > > > Of course all the mmio actions will time out, so it might take some time
> > > > > > to get through it all.
> > > > > 
> > > > > I found that PCI code provides pci_device_is_present function
> > > > > we can use to avoid timeouts - it reads device vendor and checks if all 1s is
> > > > > returned
> > > > > or not. We can call it from within register accessors before trying read/write
> > > > That's way to much overhead! We need to keep that much lower or it will result
> > > > in quite a performance drop.
> > > > 
> > > > I suggest to rather think about adding drm_dev_enter/exit guards.
> > > 
> > > Sure, this one is just a bit upstream to the disconnect event. Eventually none
> > > of them is watertight.
> > > 
> > > Andrey
> > > 
> > > 
> > > > Christian.
> > > > 
> > > > > > > Another point regarding serializing - problem  is that some of those BOs are
> > > > > > > very long lived, take for example the HW command
> > > > > > > ring buffer Christian mentioned before -
> > > > > > > (amdgpu_ring_init->amdgpu_bo_create_kernel), it's life span
> > > > > > > is basically for the entire time the device exists, it's destroyed only in
> > > > > > > the SW fini stage (when last drm_dev
> > > > > > > reference is dropped) and so should I grab it's dma_resv_lock from
> > > > > > > amdgpu_pci_remove code and wait
> > > > > > > for it to be unmapped before proceeding with the PCI remove code ? This can
> > > > > > > take unbound time and that why I don't understand
> > > > > > > how serializing will help.
> > > > > > Uh you need to untangle that. After hw cleanup is done no one is allowed
> > > > > > to touch that ringbuffer bo anymore from the kernel.
> > > > > 
> > > > > I would assume we are not allowed to touch it once we identified the device is
> > > > > gone in order to minimize the chance of accidental writes to some other
> > > > > device which might now
> > > > > occupy those IO ranges ?
> > > > > 
> > > > > 
> > > > > >    That's what
> > > > > > drm_dev_enter/exit guards are for. Like you say we cant wait for all sw
> > > > > > references to disappear.
> > > > > 
> > > > > Yes, didn't make sense to me why would we use vmap_local for internally
> > > > > allocated buffers. I think we should also guard registers read/writes for the
> > > > > same reason as above.
> > > > > 
> > > > > 
> > > > > > The vmap_local is for mappings done by other drivers, through the dma-buf
> > > > > > interface (where "other drivers" can include fbdev/fbcon, if you use the
> > > > > > generic helpers).
> > > > > > -Daniel
> > > > > 
> > > > > Ok, so I assumed that with vmap_local you were trying to solve the problem of
> > > > > quick reinsertion
> > > > > of another device into same MMIO range that my driver still points too but
> > > > > actually are you trying to solve
> > > > > the issue of exported dma buffers outliving the device ? For this we have
> > > > > drm_device refcount in the GEM layer
> > > > > i think.
> > > > > 
> > > > > Andrey
> > > > > 
> > > > > 
> > > > > > > Andrey
> > > > > > > 
> > > > > > > 
> > > > > > > > It doesn't
> > > > > > > > solve all your problems, but it's a tool to get there.
> > > > > > > > -Daniel
> > > > > > > > 
> > > > > > > > > Andrey
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > - handle fbcon somehow. I think shutting it all down should work out.
> > > > > > > > > > - worst case keep the system backing storage around for shared dma-buf
> > > > > > > > > > until the other non-dynamic driver releases it. for vram we require
> > > > > > > > > > dynamic importers (and maybe it wasn't such a bright idea to allow
> > > > > > > > > > pinning of importer buffers, might need to revisit that).
> > > > > > > > > > 
> > > > > > > > > > Cheers, Daniel
> > > > > > > > > > 
> > > > > > > > > > > Christian.
> > > > > > > > > > > 
> > > > > > > > > > > > Andrey
> > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > > -Daniel
> > > > > > > > > > > > > 
> > > > > > > > > > > > > > Christian.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > I loaded the driver with vm_update_mode=3
> > > > > > > > > > > > > > > meaning all VM updates done using CPU and hasn't seen any OOPs after
> > > > > > > > > > > > > > > removing the device. I guess i can test it more by allocating GTT and
> > > > > > > > > > > > > > > VRAM BOs
> > > > > > > > > > > > > > > and trying to read/write to them after device is removed.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Andrey
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > > > Christian.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > > Andrey
> > > > > > > > > > > > > > > _______________________________________________
> > > > > > > > > > > > > > > amd-gfx mailing list
> > > > > > > > > > > > > > > amd-gfx@lists.freedesktop.org
> > > > > > > > > > > > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Cc632e5bd5a1f402ac40608d8a2cd2072%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637438349203619335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=tKk0GTmSnkLVV42HuQaPAj01qFiwDW6Zs%2Bgi2hoq%2BvA%3D&amp;reserved=0
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > 
> > 
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
  2020-12-17 21:13                                               ` Andrey Grodzovsky
@ 2021-01-04 16:33                                                 ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-04 16:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Greg KH, dri-devel, amd-gfx list, Alex Deucher,
	Christian König, Qiang Yu


[-- Attachment #1.1: Type: text/plain, Size: 1610 bytes --]

Hey Daniel, back from vacation and going over our last long thread i think you 
didn't reply
to my last question bellow (Or at least I can't find it).

Andrey

On 12/17/20 4:13 PM, Andrey Grodzovsky wrote:
>>> Ok, so I assumed that with vmap_local you were trying to solve the problem of
>>> quick reinsertion
>>> of another device into same MMIO range that my driver still points too but
>>> actually are you trying to solve
>>> the issue of exported dma buffers outliving the device ? For this we have
>>> drm_device refcount in the GEM layer
>>> i think.
>> That's completely different lifetime problems. Don't mix them up :-)
>> One problem is the hardware disappearing, and for that we _have_ to
>> guarantee timeliness, or otherwise the pci subsystem gets pissed
>> (since like you say, a new device might show up and need it's mmio
>> bars assigned to io ranges). The other is lifetim of the software
>> objects we use as interfaces, both from userspace and from other
>> kernel drivers. There we fundamentally can't enforce timely cleanup,
>> and have to resort to refcounting.
>
>
> So regarding the second issue, as I mentioned above, don't we already use 
> drm_dev_get/put
> for exported BOs ? Earlier in this discussion you mentioned that we are ok for 
> dma buffers since
> we already have the refcounting at the GEM layer and the real life cycle 
> problem we have is the dma_fences
> for which there is no drm_dev refcounting. Seems to me then that vmap_local is 
> superfluous because
> of the recounting we already have for exported dma_bufs and for dma_fences it 
> won't help.
>
> Andrey 

[-- Attachment #1.2: Type: text/html, Size: 2753 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use
@ 2021-01-04 16:33                                                 ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-04 16:33 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Rob Herring, Greg KH, dri-devel, Anholt, Eric, Pekka Paalanen,
	amd-gfx list, Alex Deucher, Lucas Stach, Wentland, Harry,
	Christian König, Qiang Yu


[-- Attachment #1.1: Type: text/plain, Size: 1610 bytes --]

Hey Daniel, back from vacation and going over our last long thread i think you 
didn't reply
to my last question bellow (Or at least I can't find it).

Andrey

On 12/17/20 4:13 PM, Andrey Grodzovsky wrote:
>>> Ok, so I assumed that with vmap_local you were trying to solve the problem of
>>> quick reinsertion
>>> of another device into same MMIO range that my driver still points too but
>>> actually are you trying to solve
>>> the issue of exported dma buffers outliving the device ? For this we have
>>> drm_device refcount in the GEM layer
>>> i think.
>> That's completely different lifetime problems. Don't mix them up :-)
>> One problem is the hardware disappearing, and for that we _have_ to
>> guarantee timeliness, or otherwise the pci subsystem gets pissed
>> (since like you say, a new device might show up and need it's mmio
>> bars assigned to io ranges). The other is lifetim of the software
>> objects we use as interfaces, both from userspace and from other
>> kernel drivers. There we fundamentally can't enforce timely cleanup,
>> and have to resort to refcounting.
>
>
> So regarding the second issue, as I mentioned above, don't we already use 
> drm_dev_get/put
> for exported BOs ? Earlier in this discussion you mentioned that we are ok for 
> dma buffers since
> we already have the refcounting at the GEM layer and the real life cycle 
> problem we have is the dma_fences
> for which there is no drm_dev refcounting. Seems to me then that vmap_local is 
> superfluous because
> of the recounting we already have for exported dma_bufs and for dma_fences it 
> won't help.
>
> Andrey 

[-- Attachment #1.2: Type: text/html, Size: 2753 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2020-11-23  8:01         ` Christian König
@ 2021-01-05 21:04           ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-05 21:04 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh


On 11/23/20 3:01 AM, Christian König wrote:
> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>
>> On 11/21/20 9:15 AM, Christian König wrote:
>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>> Will be used to reroute CPU mapped BO's page faults once
>>>> device is removed.
>>>
>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>
>>> We need to find a different approach here.
>>>
>>> Can't we call alloc_page() on each fault and link them together so they are 
>>> freed when the device is finally reaped?
>>
>>
>> For sure better to optimize and allocate on demand when we reach this corner 
>> case, but why the linking ?
>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>
> I want to avoid keeping the page in the GEM object.
>
> What we can do is to allocate a page on demand for each fault and link the 
> together in the bdev instead.
>
> And when the bdev is then finally destroyed after the last application closed 
> we can finally release all of them.
>
> Christian.


Hey, started to implement this and then realized that by allocating a page for 
each fault indiscriminately
we will be allocating a new page for each faulting virtual address within a VA 
range belonging the same BO
and this is obviously too much and not the intention. Should I instead use let's 
say a hashtable with the hash
key being faulting BO address to actually keep allocating and reusing same dummy 
zero page per GEM BO
(or for that matter DRM file object address for non imported BOs) ?

Andrey


>
>>
>> Andrey
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>   include/drm/drm_file.h      |  2 ++
>>>>   include/drm/drm_gem.h       |  2 ++
>>>>   4 files changed, 22 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>> index 0ac4566..ff3d39f 100644
>>>> --- a/drivers/gpu/drm/drm_file.c
>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>>>               goto out_prime_destroy;
>>>>       }
>>>>   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>> +    if (!file->dummy_page) {
>>>> +        ret = -ENOMEM;
>>>> +        goto out_prime_destroy;
>>>> +    }
>>>> +
>>>>       return file;
>>>>     out_prime_destroy:
>>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>>       if (dev->driver->postclose)
>>>>           dev->driver->postclose(dev, file);
>>>>   +    __free_page(file->dummy_page);
>>>> +
>>>>       drm_prime_destroy_file_private(&file->prime);
>>>>         WARN_ON(!list_empty(&file->event_list));
>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>> index 1693aa7..987b45c 100644
>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>>>         ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>               dma_buf, *handle);
>>>> +
>>>> +    if (!ret) {
>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>> +        if (!obj->dummy_page)
>>>> +            ret = -ENOMEM;
>>>> +    }
>>>> +
>>>>       mutex_unlock(&file_priv->prime.lock);
>>>>       if (ret)
>>>>           goto fail;
>>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct drm_gem_object 
>>>> *obj, struct sg_table *sg)
>>>>           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>       dma_buf = attach->dmabuf;
>>>>       dma_buf_detach(attach->dmabuf, attach);
>>>> +
>>>> +    __free_page(obj->dummy_page);
>>>> +
>>>>       /* remove the reference */
>>>>       dma_buf_put(dma_buf);
>>>>   }
>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>> index 716990b..2a011fc 100644
>>>> --- a/include/drm/drm_file.h
>>>> +++ b/include/drm/drm_file.h
>>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>>        */
>>>>       struct drm_prime_file_private prime;
>>>>   +    struct page *dummy_page;
>>>> +
>>>>       /* private: */
>>>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>       unsigned long lock_count; /* DRI1 legacy lock count */
>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>> index 337a483..76a97a3 100644
>>>> --- a/include/drm/drm_gem.h
>>>> +++ b/include/drm/drm_gem.h
>>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>>        *
>>>>        */
>>>>       const struct drm_gem_object_funcs *funcs;
>>>> +
>>>> +    struct page *dummy_page;
>>>>   };
>>>>     /**
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ce08536eb5d514059a20108d88f85f7f1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417152856369678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Xpxivlggqknu%2FgVtpmrpYHT9g%2B%2Buj5JCPyJyoh%2B7V4%3D&amp;reserved=0 
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ce08536eb5d514059a20108d88f85f7f1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417152856369678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Xpxivlggqknu%2FgVtpmrpYHT9g%2B%2Buj5JCPyJyoh%2B7V4%3D&amp;reserved=0 
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-05 21:04           ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-05 21:04 UTC (permalink / raw)
  To: christian.koenig, amd-gfx, dri-devel, daniel.vetter, robh,
	l.stach, yuq825, eric
  Cc: Alexander.Deucher, gregkh, ppaalanen, Harry.Wentland


On 11/23/20 3:01 AM, Christian König wrote:
> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>
>> On 11/21/20 9:15 AM, Christian König wrote:
>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>> Will be used to reroute CPU mapped BO's page faults once
>>>> device is removed.
>>>
>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>
>>> We need to find a different approach here.
>>>
>>> Can't we call alloc_page() on each fault and link them together so they are 
>>> freed when the device is finally reaped?
>>
>>
>> For sure better to optimize and allocate on demand when we reach this corner 
>> case, but why the linking ?
>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>
> I want to avoid keeping the page in the GEM object.
>
> What we can do is to allocate a page on demand for each fault and link the 
> together in the bdev instead.
>
> And when the bdev is then finally destroyed after the last application closed 
> we can finally release all of them.
>
> Christian.


Hey, started to implement this and then realized that by allocating a page for 
each fault indiscriminately
we will be allocating a new page for each faulting virtual address within a VA 
range belonging the same BO
and this is obviously too much and not the intention. Should I instead use let's 
say a hashtable with the hash
key being faulting BO address to actually keep allocating and reusing same dummy 
zero page per GEM BO
(or for that matter DRM file object address for non imported BOs) ?

Andrey


>
>>
>> Andrey
>>
>>
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>   include/drm/drm_file.h      |  2 ++
>>>>   include/drm/drm_gem.h       |  2 ++
>>>>   4 files changed, 22 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>> index 0ac4566..ff3d39f 100644
>>>> --- a/drivers/gpu/drm/drm_file.c
>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>>>               goto out_prime_destroy;
>>>>       }
>>>>   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>> +    if (!file->dummy_page) {
>>>> +        ret = -ENOMEM;
>>>> +        goto out_prime_destroy;
>>>> +    }
>>>> +
>>>>       return file;
>>>>     out_prime_destroy:
>>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>>       if (dev->driver->postclose)
>>>>           dev->driver->postclose(dev, file);
>>>>   +    __free_page(file->dummy_page);
>>>> +
>>>>       drm_prime_destroy_file_private(&file->prime);
>>>>         WARN_ON(!list_empty(&file->event_list));
>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>> index 1693aa7..987b45c 100644
>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>>>         ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>               dma_buf, *handle);
>>>> +
>>>> +    if (!ret) {
>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>> +        if (!obj->dummy_page)
>>>> +            ret = -ENOMEM;
>>>> +    }
>>>> +
>>>>       mutex_unlock(&file_priv->prime.lock);
>>>>       if (ret)
>>>>           goto fail;
>>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct drm_gem_object 
>>>> *obj, struct sg_table *sg)
>>>>           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>       dma_buf = attach->dmabuf;
>>>>       dma_buf_detach(attach->dmabuf, attach);
>>>> +
>>>> +    __free_page(obj->dummy_page);
>>>> +
>>>>       /* remove the reference */
>>>>       dma_buf_put(dma_buf);
>>>>   }
>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>> index 716990b..2a011fc 100644
>>>> --- a/include/drm/drm_file.h
>>>> +++ b/include/drm/drm_file.h
>>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>>        */
>>>>       struct drm_prime_file_private prime;
>>>>   +    struct page *dummy_page;
>>>> +
>>>>       /* private: */
>>>>   #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>       unsigned long lock_count; /* DRI1 legacy lock count */
>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>> index 337a483..76a97a3 100644
>>>> --- a/include/drm/drm_gem.h
>>>> +++ b/include/drm/drm_gem.h
>>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>>        *
>>>>        */
>>>>       const struct drm_gem_object_funcs *funcs;
>>>> +
>>>> +    struct page *dummy_page;
>>>>   };
>>>>     /**
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ce08536eb5d514059a20108d88f85f7f1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417152856369678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Xpxivlggqknu%2FgVtpmrpYHT9g%2B%2Buj5JCPyJyoh%2B7V4%3D&amp;reserved=0 
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ce08536eb5d514059a20108d88f85f7f1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417152856369678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Xpxivlggqknu%2FgVtpmrpYHT9g%2B%2Buj5JCPyJyoh%2B7V4%3D&amp;reserved=0 
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-05 21:04           ` Andrey Grodzovsky
@ 2021-01-07 16:21             ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-07 16:21 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher,
	yuq825, christian.koenig

On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> 
> On 11/23/20 3:01 AM, Christian König wrote:
> > Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > > 
> > > On 11/21/20 9:15 AM, Christian König wrote:
> > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > Will be used to reroute CPU mapped BO's page faults once
> > > > > device is removed.
> > > > 
> > > > Uff, one page for each exported DMA-buf? That's not something we can do.
> > > > 
> > > > We need to find a different approach here.
> > > > 
> > > > Can't we call alloc_page() on each fault and link them together
> > > > so they are freed when the device is finally reaped?
> > > 
> > > 
> > > For sure better to optimize and allocate on demand when we reach
> > > this corner case, but why the linking ?
> > > Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > 
> > I want to avoid keeping the page in the GEM object.
> > 
> > What we can do is to allocate a page on demand for each fault and link
> > the together in the bdev instead.
> > 
> > And when the bdev is then finally destroyed after the last application
> > closed we can finally release all of them.
> > 
> > Christian.
> 
> 
> Hey, started to implement this and then realized that by allocating a page
> for each fault indiscriminately
> we will be allocating a new page for each faulting virtual address within a
> VA range belonging the same BO
> and this is obviously too much and not the intention. Should I instead use
> let's say a hashtable with the hash
> key being faulting BO address to actually keep allocating and reusing same
> dummy zero page per GEM BO
> (or for that matter DRM file object address for non imported BOs) ?

Why do we need a hashtable? All the sw structures to track this should
still be around:
- if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
  so defensively allocate a per-bo page
- otherwise allocate a per-file page

Or is the idea to save the struct page * pointer? That feels a bit like
over-optimizing stuff. Better to have a simple implementation first and
then tune it if (and only if) any part of it becomes a problem for normal
usage.
-Daniel

> 
> Andrey
> 
> 
> > 
> > > 
> > > Andrey
> > > 
> > > 
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > > 
> > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > ---
> > > > >   drivers/gpu/drm/drm_file.c  |  8 ++++++++
> > > > >   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> > > > >   include/drm/drm_file.h      |  2 ++
> > > > >   include/drm/drm_gem.h       |  2 ++
> > > > >   4 files changed, 22 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > > index 0ac4566..ff3d39f 100644
> > > > > --- a/drivers/gpu/drm/drm_file.c
> > > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > > @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
> > > > >               goto out_prime_destroy;
> > > > >       }
> > > > >   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > +    if (!file->dummy_page) {
> > > > > +        ret = -ENOMEM;
> > > > > +        goto out_prime_destroy;
> > > > > +    }
> > > > > +
> > > > >       return file;
> > > > >     out_prime_destroy:
> > > > > @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
> > > > >       if (dev->driver->postclose)
> > > > >           dev->driver->postclose(dev, file);
> > > > >   +    __free_page(file->dummy_page);
> > > > > +
> > > > >       drm_prime_destroy_file_private(&file->prime);
> > > > >         WARN_ON(!list_empty(&file->event_list));
> > > > > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > > > > index 1693aa7..987b45c 100644
> > > > > --- a/drivers/gpu/drm/drm_prime.c
> > > > > +++ b/drivers/gpu/drm/drm_prime.c
> > > > > @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > > > >         ret = drm_prime_add_buf_handle(&file_priv->prime,
> > > > >               dma_buf, *handle);
> > > > > +
> > > > > +    if (!ret) {
> > > > > +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > +        if (!obj->dummy_page)
> > > > > +            ret = -ENOMEM;
> > > > > +    }
> > > > > +
> > > > >       mutex_unlock(&file_priv->prime.lock);
> > > > >       if (ret)
> > > > >           goto fail;
> > > > > @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
> > > > > drm_gem_object *obj, struct sg_table *sg)
> > > > >           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
> > > > >       dma_buf = attach->dmabuf;
> > > > >       dma_buf_detach(attach->dmabuf, attach);
> > > > > +
> > > > > +    __free_page(obj->dummy_page);
> > > > > +
> > > > >       /* remove the reference */
> > > > >       dma_buf_put(dma_buf);
> > > > >   }
> > > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > > index 716990b..2a011fc 100644
> > > > > --- a/include/drm/drm_file.h
> > > > > +++ b/include/drm/drm_file.h
> > > > > @@ -346,6 +346,8 @@ struct drm_file {
> > > > >        */
> > > > >       struct drm_prime_file_private prime;
> > > > >   +    struct page *dummy_page;
> > > > > +
> > > > >       /* private: */
> > > > >   #if IS_ENABLED(CONFIG_DRM_LEGACY)
> > > > >       unsigned long lock_count; /* DRI1 legacy lock count */
> > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > index 337a483..76a97a3 100644
> > > > > --- a/include/drm/drm_gem.h
> > > > > +++ b/include/drm/drm_gem.h
> > > > > @@ -311,6 +311,8 @@ struct drm_gem_object {
> > > > >        *
> > > > >        */
> > > > >       const struct drm_gem_object_funcs *funcs;
> > > > > +
> > > > > +    struct page *dummy_page;
> > > > >   };
> > > > >     /**
> > > > 
> > > _______________________________________________
> > > amd-gfx mailing list
> > > amd-gfx@lists.freedesktop.org
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ce08536eb5d514059a20108d88f85f7f1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417152856369678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Xpxivlggqknu%2FgVtpmrpYHT9g%2B%2Buj5JCPyJyoh%2B7V4%3D&amp;reserved=0
> > > 
> > 
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ce08536eb5d514059a20108d88f85f7f1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417152856369678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Xpxivlggqknu%2FgVtpmrpYHT9g%2B%2Buj5JCPyJyoh%2B7V4%3D&amp;reserved=0
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-07 16:21             ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-07 16:21 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx, gregkh,
	Alexander.Deucher, yuq825, Harry.Wentland, christian.koenig,
	l.stach

On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> 
> On 11/23/20 3:01 AM, Christian König wrote:
> > Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > > 
> > > On 11/21/20 9:15 AM, Christian König wrote:
> > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > Will be used to reroute CPU mapped BO's page faults once
> > > > > device is removed.
> > > > 
> > > > Uff, one page for each exported DMA-buf? That's not something we can do.
> > > > 
> > > > We need to find a different approach here.
> > > > 
> > > > Can't we call alloc_page() on each fault and link them together
> > > > so they are freed when the device is finally reaped?
> > > 
> > > 
> > > For sure better to optimize and allocate on demand when we reach
> > > this corner case, but why the linking ?
> > > Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > 
> > I want to avoid keeping the page in the GEM object.
> > 
> > What we can do is to allocate a page on demand for each fault and link
> > the together in the bdev instead.
> > 
> > And when the bdev is then finally destroyed after the last application
> > closed we can finally release all of them.
> > 
> > Christian.
> 
> 
> Hey, started to implement this and then realized that by allocating a page
> for each fault indiscriminately
> we will be allocating a new page for each faulting virtual address within a
> VA range belonging the same BO
> and this is obviously too much and not the intention. Should I instead use
> let's say a hashtable with the hash
> key being faulting BO address to actually keep allocating and reusing same
> dummy zero page per GEM BO
> (or for that matter DRM file object address for non imported BOs) ?

Why do we need a hashtable? All the sw structures to track this should
still be around:
- if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
  so defensively allocate a per-bo page
- otherwise allocate a per-file page

Or is the idea to save the struct page * pointer? That feels a bit like
over-optimizing stuff. Better to have a simple implementation first and
then tune it if (and only if) any part of it becomes a problem for normal
usage.
-Daniel

> 
> Andrey
> 
> 
> > 
> > > 
> > > Andrey
> > > 
> > > 
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > > 
> > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > ---
> > > > >   drivers/gpu/drm/drm_file.c  |  8 ++++++++
> > > > >   drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> > > > >   include/drm/drm_file.h      |  2 ++
> > > > >   include/drm/drm_gem.h       |  2 ++
> > > > >   4 files changed, 22 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > > index 0ac4566..ff3d39f 100644
> > > > > --- a/drivers/gpu/drm/drm_file.c
> > > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > > @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
> > > > >               goto out_prime_destroy;
> > > > >       }
> > > > >   +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > +    if (!file->dummy_page) {
> > > > > +        ret = -ENOMEM;
> > > > > +        goto out_prime_destroy;
> > > > > +    }
> > > > > +
> > > > >       return file;
> > > > >     out_prime_destroy:
> > > > > @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
> > > > >       if (dev->driver->postclose)
> > > > >           dev->driver->postclose(dev, file);
> > > > >   +    __free_page(file->dummy_page);
> > > > > +
> > > > >       drm_prime_destroy_file_private(&file->prime);
> > > > >         WARN_ON(!list_empty(&file->event_list));
> > > > > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > > > > index 1693aa7..987b45c 100644
> > > > > --- a/drivers/gpu/drm/drm_prime.c
> > > > > +++ b/drivers/gpu/drm/drm_prime.c
> > > > > @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > > > >         ret = drm_prime_add_buf_handle(&file_priv->prime,
> > > > >               dma_buf, *handle);
> > > > > +
> > > > > +    if (!ret) {
> > > > > +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > +        if (!obj->dummy_page)
> > > > > +            ret = -ENOMEM;
> > > > > +    }
> > > > > +
> > > > >       mutex_unlock(&file_priv->prime.lock);
> > > > >       if (ret)
> > > > >           goto fail;
> > > > > @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
> > > > > drm_gem_object *obj, struct sg_table *sg)
> > > > >           dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
> > > > >       dma_buf = attach->dmabuf;
> > > > >       dma_buf_detach(attach->dmabuf, attach);
> > > > > +
> > > > > +    __free_page(obj->dummy_page);
> > > > > +
> > > > >       /* remove the reference */
> > > > >       dma_buf_put(dma_buf);
> > > > >   }
> > > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > > index 716990b..2a011fc 100644
> > > > > --- a/include/drm/drm_file.h
> > > > > +++ b/include/drm/drm_file.h
> > > > > @@ -346,6 +346,8 @@ struct drm_file {
> > > > >        */
> > > > >       struct drm_prime_file_private prime;
> > > > >   +    struct page *dummy_page;
> > > > > +
> > > > >       /* private: */
> > > > >   #if IS_ENABLED(CONFIG_DRM_LEGACY)
> > > > >       unsigned long lock_count; /* DRI1 legacy lock count */
> > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > index 337a483..76a97a3 100644
> > > > > --- a/include/drm/drm_gem.h
> > > > > +++ b/include/drm/drm_gem.h
> > > > > @@ -311,6 +311,8 @@ struct drm_gem_object {
> > > > >        *
> > > > >        */
> > > > >       const struct drm_gem_object_funcs *funcs;
> > > > > +
> > > > > +    struct page *dummy_page;
> > > > >   };
> > > > >     /**
> > > > 
> > > _______________________________________________
> > > amd-gfx mailing list
> > > amd-gfx@lists.freedesktop.org
> > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ce08536eb5d514059a20108d88f85f7f1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417152856369678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Xpxivlggqknu%2FgVtpmrpYHT9g%2B%2Buj5JCPyJyoh%2B7V4%3D&amp;reserved=0
> > > 
> > 
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7Ce08536eb5d514059a20108d88f85f7f1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637417152856369678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5Xpxivlggqknu%2FgVtpmrpYHT9g%2B%2Buj5JCPyJyoh%2B7V4%3D&amp;reserved=0
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-07 16:21             ` Daniel Vetter
@ 2021-01-07 16:26               ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-07 16:26 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher,
	yuq825, christian.koenig


On 1/7/21 11:21 AM, Daniel Vetter wrote:
> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>> On 11/23/20 3:01 AM, Christian König wrote:
>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>> device is removed.
>>>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>>>
>>>>> We need to find a different approach here.
>>>>>
>>>>> Can't we call alloc_page() on each fault and link them together
>>>>> so they are freed when the device is finally reaped?
>>>>
>>>> For sure better to optimize and allocate on demand when we reach
>>>> this corner case, but why the linking ?
>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>> I want to avoid keeping the page in the GEM object.
>>>
>>> What we can do is to allocate a page on demand for each fault and link
>>> the together in the bdev instead.
>>>
>>> And when the bdev is then finally destroyed after the last application
>>> closed we can finally release all of them.
>>>
>>> Christian.
>>
>> Hey, started to implement this and then realized that by allocating a page
>> for each fault indiscriminately
>> we will be allocating a new page for each faulting virtual address within a
>> VA range belonging the same BO
>> and this is obviously too much and not the intention. Should I instead use
>> let's say a hashtable with the hash
>> key being faulting BO address to actually keep allocating and reusing same
>> dummy zero page per GEM BO
>> (or for that matter DRM file object address for non imported BOs) ?
> Why do we need a hashtable? All the sw structures to track this should
> still be around:
> - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
>    so defensively allocate a per-bo page
> - otherwise allocate a per-file page


That exactly what we have in current implementation


>
> Or is the idea to save the struct page * pointer? That feels a bit like
> over-optimizing stuff. Better to have a simple implementation first and
> then tune it if (and only if) any part of it becomes a problem for normal
> usage.


Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
Christian suggested to instead keep a linked list of dummy pages to be
allocated on demand once we hit a vm_fault. I will then also prefault the entire
VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
to that single dummy page.

Andrey


> -Daniel
>
>> Andrey
>>
>>
>>>> Andrey
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>    include/drm/drm_file.h      |  2 ++
>>>>>>    include/drm/drm_gem.h       |  2 ++
>>>>>>    4 files changed, 22 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>> index 0ac4566..ff3d39f 100644
>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>>>>>                goto out_prime_destroy;
>>>>>>        }
>>>>>>    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>> +    if (!file->dummy_page) {
>>>>>> +        ret = -ENOMEM;
>>>>>> +        goto out_prime_destroy;
>>>>>> +    }
>>>>>> +
>>>>>>        return file;
>>>>>>      out_prime_destroy:
>>>>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>        if (dev->driver->postclose)
>>>>>>            dev->driver->postclose(dev, file);
>>>>>>    +    __free_page(file->dummy_page);
>>>>>> +
>>>>>>        drm_prime_destroy_file_private(&file->prime);
>>>>>>          WARN_ON(!list_empty(&file->event_list));
>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>>>> index 1693aa7..987b45c 100644
>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>>>>>          ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>                dma_buf, *handle);
>>>>>> +
>>>>>> +    if (!ret) {
>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>> +        if (!obj->dummy_page)
>>>>>> +            ret = -ENOMEM;
>>>>>> +    }
>>>>>> +
>>>>>>        mutex_unlock(&file_priv->prime.lock);
>>>>>>        if (ret)
>>>>>>            goto fail;
>>>>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
>>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>>            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>>        dma_buf = attach->dmabuf;
>>>>>>        dma_buf_detach(attach->dmabuf, attach);
>>>>>> +
>>>>>> +    __free_page(obj->dummy_page);
>>>>>> +
>>>>>>        /* remove the reference */
>>>>>>        dma_buf_put(dma_buf);
>>>>>>    }
>>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>>> index 716990b..2a011fc 100644
>>>>>> --- a/include/drm/drm_file.h
>>>>>> +++ b/include/drm/drm_file.h
>>>>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>>>>         */
>>>>>>        struct drm_prime_file_private prime;
>>>>>>    +    struct page *dummy_page;
>>>>>> +
>>>>>>        /* private: */
>>>>>>    #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>>        unsigned long lock_count; /* DRI1 legacy lock count */
>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>>> index 337a483..76a97a3 100644
>>>>>> --- a/include/drm/drm_gem.h
>>>>>> +++ b/include/drm/drm_gem.h
>>>>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>>>>         *
>>>>>>         */
>>>>>>        const struct drm_gem_object_funcs *funcs;
>>>>>> +
>>>>>> +    struct page *dummy_page;
>>>>>>    };
>>>>>>      /**
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209139294%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Org1D62C0GXyVn6rW8SnAkhhX8xvJXFCqA5zqyaR%2BeU%3D&amp;reserved=0
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209149289%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WcnuJGZg%2B8jysOk2nTN9jXeyFkhauxMr4ajQYjP39zQ%3D&amp;reserved=0
>>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-07 16:26               ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-07 16:26 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx, gregkh,
	Alexander.Deucher, yuq825, Harry.Wentland, christian.koenig,
	l.stach


On 1/7/21 11:21 AM, Daniel Vetter wrote:
> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>> On 11/23/20 3:01 AM, Christian König wrote:
>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>> device is removed.
>>>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>>>
>>>>> We need to find a different approach here.
>>>>>
>>>>> Can't we call alloc_page() on each fault and link them together
>>>>> so they are freed when the device is finally reaped?
>>>>
>>>> For sure better to optimize and allocate on demand when we reach
>>>> this corner case, but why the linking ?
>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>> I want to avoid keeping the page in the GEM object.
>>>
>>> What we can do is to allocate a page on demand for each fault and link
>>> the together in the bdev instead.
>>>
>>> And when the bdev is then finally destroyed after the last application
>>> closed we can finally release all of them.
>>>
>>> Christian.
>>
>> Hey, started to implement this and then realized that by allocating a page
>> for each fault indiscriminately
>> we will be allocating a new page for each faulting virtual address within a
>> VA range belonging the same BO
>> and this is obviously too much and not the intention. Should I instead use
>> let's say a hashtable with the hash
>> key being faulting BO address to actually keep allocating and reusing same
>> dummy zero page per GEM BO
>> (or for that matter DRM file object address for non imported BOs) ?
> Why do we need a hashtable? All the sw structures to track this should
> still be around:
> - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
>    so defensively allocate a per-bo page
> - otherwise allocate a per-file page


That exactly what we have in current implementation


>
> Or is the idea to save the struct page * pointer? That feels a bit like
> over-optimizing stuff. Better to have a simple implementation first and
> then tune it if (and only if) any part of it becomes a problem for normal
> usage.


Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
Christian suggested to instead keep a linked list of dummy pages to be
allocated on demand once we hit a vm_fault. I will then also prefault the entire
VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
to that single dummy page.

Andrey


> -Daniel
>
>> Andrey
>>
>>
>>>> Andrey
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>    include/drm/drm_file.h      |  2 ++
>>>>>>    include/drm/drm_gem.h       |  2 ++
>>>>>>    4 files changed, 22 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>> index 0ac4566..ff3d39f 100644
>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>>>>>                goto out_prime_destroy;
>>>>>>        }
>>>>>>    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>> +    if (!file->dummy_page) {
>>>>>> +        ret = -ENOMEM;
>>>>>> +        goto out_prime_destroy;
>>>>>> +    }
>>>>>> +
>>>>>>        return file;
>>>>>>      out_prime_destroy:
>>>>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>        if (dev->driver->postclose)
>>>>>>            dev->driver->postclose(dev, file);
>>>>>>    +    __free_page(file->dummy_page);
>>>>>> +
>>>>>>        drm_prime_destroy_file_private(&file->prime);
>>>>>>          WARN_ON(!list_empty(&file->event_list));
>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>>>> index 1693aa7..987b45c 100644
>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>>>>>          ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>                dma_buf, *handle);
>>>>>> +
>>>>>> +    if (!ret) {
>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>> +        if (!obj->dummy_page)
>>>>>> +            ret = -ENOMEM;
>>>>>> +    }
>>>>>> +
>>>>>>        mutex_unlock(&file_priv->prime.lock);
>>>>>>        if (ret)
>>>>>>            goto fail;
>>>>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
>>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>>            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>>        dma_buf = attach->dmabuf;
>>>>>>        dma_buf_detach(attach->dmabuf, attach);
>>>>>> +
>>>>>> +    __free_page(obj->dummy_page);
>>>>>> +
>>>>>>        /* remove the reference */
>>>>>>        dma_buf_put(dma_buf);
>>>>>>    }
>>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>>> index 716990b..2a011fc 100644
>>>>>> --- a/include/drm/drm_file.h
>>>>>> +++ b/include/drm/drm_file.h
>>>>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>>>>         */
>>>>>>        struct drm_prime_file_private prime;
>>>>>>    +    struct page *dummy_page;
>>>>>> +
>>>>>>        /* private: */
>>>>>>    #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>>        unsigned long lock_count; /* DRI1 legacy lock count */
>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>>> index 337a483..76a97a3 100644
>>>>>> --- a/include/drm/drm_gem.h
>>>>>> +++ b/include/drm/drm_gem.h
>>>>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>>>>         *
>>>>>>         */
>>>>>>        const struct drm_gem_object_funcs *funcs;
>>>>>> +
>>>>>> +    struct page *dummy_page;
>>>>>>    };
>>>>>>      /**
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209139294%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Org1D62C0GXyVn6rW8SnAkhhX8xvJXFCqA5zqyaR%2BeU%3D&amp;reserved=0
>>>>
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209149289%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WcnuJGZg%2B8jysOk2nTN9jXeyFkhauxMr4ajQYjP39zQ%3D&amp;reserved=0
>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-07 16:26               ` Andrey Grodzovsky
@ 2021-01-07 16:28                 ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-07 16:28 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Alexander.Deucher,
	christian.koenig, yuq825


[-- Attachment #1.1: Type: text/plain, Size: 726 bytes --]

Typo Correction bellow

On 1/7/21 11:26 AM, Andrey Grodzovsky wrote:
>>
>> Or is the idea to save the struct page * pointer? That feels a bit like
>> over-optimizing stuff. Better to have a simple implementation first and
>> then tune it if (and only if) any part of it becomes a problem for normal
>> usage.
>
>
> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
> Christian suggested to instead keep a linked list of dummy pages to be
> allocated on demand once we hit a vm_fault. I will then also prefault the entire
> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
> to that single dummy page.


Obviously the range is from  vma->vm_start to vma->vm_end

Andrey


>
> Andrey 

[-- Attachment #1.2: Type: text/html, Size: 1545 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-07 16:28                 ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-07 16:28 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Alexander.Deucher, l.stach, Harry.Wentland, christian.koenig,
	yuq825


[-- Attachment #1.1: Type: text/plain, Size: 726 bytes --]

Typo Correction bellow

On 1/7/21 11:26 AM, Andrey Grodzovsky wrote:
>>
>> Or is the idea to save the struct page * pointer? That feels a bit like
>> over-optimizing stuff. Better to have a simple implementation first and
>> then tune it if (and only if) any part of it becomes a problem for normal
>> usage.
>
>
> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
> Christian suggested to instead keep a linked list of dummy pages to be
> allocated on demand once we hit a vm_fault. I will then also prefault the entire
> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
> to that single dummy page.


Obviously the range is from  vma->vm_start to vma->vm_end

Andrey


>
> Andrey 

[-- Attachment #1.2: Type: text/html, Size: 1545 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-07 16:26               ` Andrey Grodzovsky
@ 2021-01-07 16:30                 ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-07 16:30 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher,
	yuq825, christian.koenig

On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> 
> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> > On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> > > On 11/23/20 3:01 AM, Christian König wrote:
> > > > Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > > > > On 11/21/20 9:15 AM, Christian König wrote:
> > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > Will be used to reroute CPU mapped BO's page faults once
> > > > > > > device is removed.
> > > > > > Uff, one page for each exported DMA-buf? That's not something we can do.
> > > > > > 
> > > > > > We need to find a different approach here.
> > > > > > 
> > > > > > Can't we call alloc_page() on each fault and link them together
> > > > > > so they are freed when the device is finally reaped?
> > > > > 
> > > > > For sure better to optimize and allocate on demand when we reach
> > > > > this corner case, but why the linking ?
> > > > > Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > > > I want to avoid keeping the page in the GEM object.
> > > > 
> > > > What we can do is to allocate a page on demand for each fault and link
> > > > the together in the bdev instead.
> > > > 
> > > > And when the bdev is then finally destroyed after the last application
> > > > closed we can finally release all of them.
> > > > 
> > > > Christian.
> > > 
> > > Hey, started to implement this and then realized that by allocating a page
> > > for each fault indiscriminately
> > > we will be allocating a new page for each faulting virtual address within a
> > > VA range belonging the same BO
> > > and this is obviously too much and not the intention. Should I instead use
> > > let's say a hashtable with the hash
> > > key being faulting BO address to actually keep allocating and reusing same
> > > dummy zero page per GEM BO
> > > (or for that matter DRM file object address for non imported BOs) ?
> > Why do we need a hashtable? All the sw structures to track this should
> > still be around:
> > - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
> >    so defensively allocate a per-bo page
> > - otherwise allocate a per-file page
> 
> 
> That exactly what we have in current implementation
> 
> 
> > 
> > Or is the idea to save the struct page * pointer? That feels a bit like
> > over-optimizing stuff. Better to have a simple implementation first and
> > then tune it if (and only if) any part of it becomes a problem for normal
> > usage.
> 
> 
> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
> Christian suggested to instead keep a linked list of dummy pages to be
> allocated on demand once we hit a vm_fault. I will then also prefault the entire
> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
> to that single dummy page.

This strongly feels like premature optimization. If you're worried about
the overhead on amdgpu, pay down the debt by removing one of the redundant
pointers between gem and ttm bo structs (I think we still have some) :-)

Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
pointer just because" games with hashtables.
-Daniel

> 
> Andrey
> 
> 
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > > > Andrey
> > > > > 
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > > > ---
> > > > > > >    drivers/gpu/drm/drm_file.c  |  8 ++++++++
> > > > > > >    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> > > > > > >    include/drm/drm_file.h      |  2 ++
> > > > > > >    include/drm/drm_gem.h       |  2 ++
> > > > > > >    4 files changed, 22 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > > > > index 0ac4566..ff3d39f 100644
> > > > > > > --- a/drivers/gpu/drm/drm_file.c
> > > > > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > > > > @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
> > > > > > >                goto out_prime_destroy;
> > > > > > >        }
> > > > > > >    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > > > +    if (!file->dummy_page) {
> > > > > > > +        ret = -ENOMEM;
> > > > > > > +        goto out_prime_destroy;
> > > > > > > +    }
> > > > > > > +
> > > > > > >        return file;
> > > > > > >      out_prime_destroy:
> > > > > > > @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
> > > > > > >        if (dev->driver->postclose)
> > > > > > >            dev->driver->postclose(dev, file);
> > > > > > >    +    __free_page(file->dummy_page);
> > > > > > > +
> > > > > > >        drm_prime_destroy_file_private(&file->prime);
> > > > > > >          WARN_ON(!list_empty(&file->event_list));
> > > > > > > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > > > > > > index 1693aa7..987b45c 100644
> > > > > > > --- a/drivers/gpu/drm/drm_prime.c
> > > > > > > +++ b/drivers/gpu/drm/drm_prime.c
> > > > > > > @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > > > > > >          ret = drm_prime_add_buf_handle(&file_priv->prime,
> > > > > > >                dma_buf, *handle);
> > > > > > > +
> > > > > > > +    if (!ret) {
> > > > > > > +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > > > +        if (!obj->dummy_page)
> > > > > > > +            ret = -ENOMEM;
> > > > > > > +    }
> > > > > > > +
> > > > > > >        mutex_unlock(&file_priv->prime.lock);
> > > > > > >        if (ret)
> > > > > > >            goto fail;
> > > > > > > @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
> > > > > > > drm_gem_object *obj, struct sg_table *sg)
> > > > > > >            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
> > > > > > >        dma_buf = attach->dmabuf;
> > > > > > >        dma_buf_detach(attach->dmabuf, attach);
> > > > > > > +
> > > > > > > +    __free_page(obj->dummy_page);
> > > > > > > +
> > > > > > >        /* remove the reference */
> > > > > > >        dma_buf_put(dma_buf);
> > > > > > >    }
> > > > > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > > > > index 716990b..2a011fc 100644
> > > > > > > --- a/include/drm/drm_file.h
> > > > > > > +++ b/include/drm/drm_file.h
> > > > > > > @@ -346,6 +346,8 @@ struct drm_file {
> > > > > > >         */
> > > > > > >        struct drm_prime_file_private prime;
> > > > > > >    +    struct page *dummy_page;
> > > > > > > +
> > > > > > >        /* private: */
> > > > > > >    #if IS_ENABLED(CONFIG_DRM_LEGACY)
> > > > > > >        unsigned long lock_count; /* DRI1 legacy lock count */
> > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > > > index 337a483..76a97a3 100644
> > > > > > > --- a/include/drm/drm_gem.h
> > > > > > > +++ b/include/drm/drm_gem.h
> > > > > > > @@ -311,6 +311,8 @@ struct drm_gem_object {
> > > > > > >         *
> > > > > > >         */
> > > > > > >        const struct drm_gem_object_funcs *funcs;
> > > > > > > +
> > > > > > > +    struct page *dummy_page;
> > > > > > >    };
> > > > > > >      /**
> > > > > _______________________________________________
> > > > > amd-gfx mailing list
> > > > > amd-gfx@lists.freedesktop.org
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209139294%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Org1D62C0GXyVn6rW8SnAkhhX8xvJXFCqA5zqyaR%2BeU%3D&amp;reserved=0
> > > > > 
> > > > _______________________________________________
> > > > amd-gfx mailing list
> > > > amd-gfx@lists.freedesktop.org
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209149289%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WcnuJGZg%2B8jysOk2nTN9jXeyFkhauxMr4ajQYjP39zQ%3D&amp;reserved=0
> > > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-07 16:30                 ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-07 16:30 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx,
	Daniel Vetter, gregkh, Alexander.Deucher, yuq825, Harry.Wentland,
	christian.koenig, l.stach

On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> 
> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> > On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> > > On 11/23/20 3:01 AM, Christian König wrote:
> > > > Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > > > > On 11/21/20 9:15 AM, Christian König wrote:
> > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > Will be used to reroute CPU mapped BO's page faults once
> > > > > > > device is removed.
> > > > > > Uff, one page for each exported DMA-buf? That's not something we can do.
> > > > > > 
> > > > > > We need to find a different approach here.
> > > > > > 
> > > > > > Can't we call alloc_page() on each fault and link them together
> > > > > > so they are freed when the device is finally reaped?
> > > > > 
> > > > > For sure better to optimize and allocate on demand when we reach
> > > > > this corner case, but why the linking ?
> > > > > Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > > > I want to avoid keeping the page in the GEM object.
> > > > 
> > > > What we can do is to allocate a page on demand for each fault and link
> > > > the together in the bdev instead.
> > > > 
> > > > And when the bdev is then finally destroyed after the last application
> > > > closed we can finally release all of them.
> > > > 
> > > > Christian.
> > > 
> > > Hey, started to implement this and then realized that by allocating a page
> > > for each fault indiscriminately
> > > we will be allocating a new page for each faulting virtual address within a
> > > VA range belonging the same BO
> > > and this is obviously too much and not the intention. Should I instead use
> > > let's say a hashtable with the hash
> > > key being faulting BO address to actually keep allocating and reusing same
> > > dummy zero page per GEM BO
> > > (or for that matter DRM file object address for non imported BOs) ?
> > Why do we need a hashtable? All the sw structures to track this should
> > still be around:
> > - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
> >    so defensively allocate a per-bo page
> > - otherwise allocate a per-file page
> 
> 
> That exactly what we have in current implementation
> 
> 
> > 
> > Or is the idea to save the struct page * pointer? That feels a bit like
> > over-optimizing stuff. Better to have a simple implementation first and
> > then tune it if (and only if) any part of it becomes a problem for normal
> > usage.
> 
> 
> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
> Christian suggested to instead keep a linked list of dummy pages to be
> allocated on demand once we hit a vm_fault. I will then also prefault the entire
> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
> to that single dummy page.

This strongly feels like premature optimization. If you're worried about
the overhead on amdgpu, pay down the debt by removing one of the redundant
pointers between gem and ttm bo structs (I think we still have some) :-)

Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
pointer just because" games with hashtables.
-Daniel

> 
> Andrey
> 
> 
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > > > Andrey
> > > > > 
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
> > > > > > > ---
> > > > > > >    drivers/gpu/drm/drm_file.c  |  8 ++++++++
> > > > > > >    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> > > > > > >    include/drm/drm_file.h      |  2 ++
> > > > > > >    include/drm/drm_gem.h       |  2 ++
> > > > > > >    4 files changed, 22 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > > > > index 0ac4566..ff3d39f 100644
> > > > > > > --- a/drivers/gpu/drm/drm_file.c
> > > > > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > > > > @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
> > > > > > >                goto out_prime_destroy;
> > > > > > >        }
> > > > > > >    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > > > +    if (!file->dummy_page) {
> > > > > > > +        ret = -ENOMEM;
> > > > > > > +        goto out_prime_destroy;
> > > > > > > +    }
> > > > > > > +
> > > > > > >        return file;
> > > > > > >      out_prime_destroy:
> > > > > > > @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
> > > > > > >        if (dev->driver->postclose)
> > > > > > >            dev->driver->postclose(dev, file);
> > > > > > >    +    __free_page(file->dummy_page);
> > > > > > > +
> > > > > > >        drm_prime_destroy_file_private(&file->prime);
> > > > > > >          WARN_ON(!list_empty(&file->event_list));
> > > > > > > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > > > > > > index 1693aa7..987b45c 100644
> > > > > > > --- a/drivers/gpu/drm/drm_prime.c
> > > > > > > +++ b/drivers/gpu/drm/drm_prime.c
> > > > > > > @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > > > > > >          ret = drm_prime_add_buf_handle(&file_priv->prime,
> > > > > > >                dma_buf, *handle);
> > > > > > > +
> > > > > > > +    if (!ret) {
> > > > > > > +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > > > +        if (!obj->dummy_page)
> > > > > > > +            ret = -ENOMEM;
> > > > > > > +    }
> > > > > > > +
> > > > > > >        mutex_unlock(&file_priv->prime.lock);
> > > > > > >        if (ret)
> > > > > > >            goto fail;
> > > > > > > @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
> > > > > > > drm_gem_object *obj, struct sg_table *sg)
> > > > > > >            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
> > > > > > >        dma_buf = attach->dmabuf;
> > > > > > >        dma_buf_detach(attach->dmabuf, attach);
> > > > > > > +
> > > > > > > +    __free_page(obj->dummy_page);
> > > > > > > +
> > > > > > >        /* remove the reference */
> > > > > > >        dma_buf_put(dma_buf);
> > > > > > >    }
> > > > > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > > > > index 716990b..2a011fc 100644
> > > > > > > --- a/include/drm/drm_file.h
> > > > > > > +++ b/include/drm/drm_file.h
> > > > > > > @@ -346,6 +346,8 @@ struct drm_file {
> > > > > > >         */
> > > > > > >        struct drm_prime_file_private prime;
> > > > > > >    +    struct page *dummy_page;
> > > > > > > +
> > > > > > >        /* private: */
> > > > > > >    #if IS_ENABLED(CONFIG_DRM_LEGACY)
> > > > > > >        unsigned long lock_count; /* DRI1 legacy lock count */
> > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > > > index 337a483..76a97a3 100644
> > > > > > > --- a/include/drm/drm_gem.h
> > > > > > > +++ b/include/drm/drm_gem.h
> > > > > > > @@ -311,6 +311,8 @@ struct drm_gem_object {
> > > > > > >         *
> > > > > > >         */
> > > > > > >        const struct drm_gem_object_funcs *funcs;
> > > > > > > +
> > > > > > > +    struct page *dummy_page;
> > > > > > >    };
> > > > > > >      /**
> > > > > _______________________________________________
> > > > > amd-gfx mailing list
> > > > > amd-gfx@lists.freedesktop.org
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209139294%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Org1D62C0GXyVn6rW8SnAkhhX8xvJXFCqA5zqyaR%2BeU%3D&amp;reserved=0
> > > > > 
> > > > _______________________________________________
> > > > amd-gfx mailing list
> > > > amd-gfx@lists.freedesktop.org
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209149289%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=WcnuJGZg%2B8jysOk2nTN9jXeyFkhauxMr4ajQYjP39zQ%3D&amp;reserved=0
> > > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-07 16:30                 ` Daniel Vetter
@ 2021-01-07 16:37                   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-07 16:37 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Alexander.Deucher,
	yuq825, christian.koenig


On 1/7/21 11:30 AM, Daniel Vetter wrote:
> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>> device is removed.
>>>>>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>>>>>
>>>>>>> We need to find a different approach here.
>>>>>>>
>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>> so they are freed when the device is finally reaped?
>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>> this corner case, but why the linking ?
>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>> I want to avoid keeping the page in the GEM object.
>>>>>
>>>>> What we can do is to allocate a page on demand for each fault and link
>>>>> the together in the bdev instead.
>>>>>
>>>>> And when the bdev is then finally destroyed after the last application
>>>>> closed we can finally release all of them.
>>>>>
>>>>> Christian.
>>>> Hey, started to implement this and then realized that by allocating a page
>>>> for each fault indiscriminately
>>>> we will be allocating a new page for each faulting virtual address within a
>>>> VA range belonging the same BO
>>>> and this is obviously too much and not the intention. Should I instead use
>>>> let's say a hashtable with the hash
>>>> key being faulting BO address to actually keep allocating and reusing same
>>>> dummy zero page per GEM BO
>>>> (or for that matter DRM file object address for non imported BOs) ?
>>> Why do we need a hashtable? All the sw structures to track this should
>>> still be around:
>>> - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
>>>     so defensively allocate a per-bo page
>>> - otherwise allocate a per-file page
>>
>> That exactly what we have in current implementation
>>
>>
>>> Or is the idea to save the struct page * pointer? That feels a bit like
>>> over-optimizing stuff. Better to have a simple implementation first and
>>> then tune it if (and only if) any part of it becomes a problem for normal
>>> usage.
>>
>> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
>> Christian suggested to instead keep a linked list of dummy pages to be
>> allocated on demand once we hit a vm_fault. I will then also prefault the entire
>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
>> to that single dummy page.
> This strongly feels like premature optimization. If you're worried about
> the overhead on amdgpu, pay down the debt by removing one of the redundant
> pointers between gem and ttm bo structs (I think we still have some) :-)
>
> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> pointer just because" games with hashtables.
> -Daniel


Well, if you and Christian can agree on this approach and suggest maybe what 
pointer is
redundant and can be removed from GEM struct so we can use the 'credit' to add 
the dummy page
to GEM I will be happy to follow through.

P.S Hash table is off the table anyway and we are talking only about linked list 
here since by prefaulting
the entire VA range for a vmf->vma i will be avoiding redundant page faults to 
same VMA VA range and so
don't need to search and reuse an existing dummy page but simply create a new 
one for each next fault.

Andrey


>
>> Andrey
>>
>>
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>>>> index 0ac4566..ff3d39f 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>         }
>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>> +        ret = -ENOMEM;
>>>>>>>> +        goto out_prime_destroy;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         return file;
>>>>>>>>       out_prime_destroy:
>>>>>>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>             dev->driver->postclose(dev, file);
>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>> +
>>>>>>>>         drm_prime_destroy_file_private(&file->prime);
>>>>>>>>           WARN_ON(!list_empty(&file->event_list));
>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>>>>>> index 1693aa7..987b45c 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>                 dma_buf, *handle);
>>>>>>>> +
>>>>>>>> +    if (!ret) {
>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>> +            ret = -ENOMEM;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         mutex_unlock(&file_priv->prime.lock);
>>>>>>>>         if (ret)
>>>>>>>>             goto fail;
>>>>>>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
>>>>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>>>>             dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>>>>         dma_buf = attach->dmabuf;
>>>>>>>>         dma_buf_detach(attach->dmabuf, attach);
>>>>>>>> +
>>>>>>>> +    __free_page(obj->dummy_page);
>>>>>>>> +
>>>>>>>>         /* remove the reference */
>>>>>>>>         dma_buf_put(dma_buf);
>>>>>>>>     }
>>>>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>>>>> index 716990b..2a011fc 100644
>>>>>>>> --- a/include/drm/drm_file.h
>>>>>>>> +++ b/include/drm/drm_file.h
>>>>>>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>>>>>>          */
>>>>>>>>         struct drm_prime_file_private prime;
>>>>>>>>     +    struct page *dummy_page;
>>>>>>>> +
>>>>>>>>         /* private: */
>>>>>>>>     #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>>>>         unsigned long lock_count; /* DRI1 legacy lock count */
>>>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>>>>> index 337a483..76a97a3 100644
>>>>>>>> --- a/include/drm/drm_gem.h
>>>>>>>> +++ b/include/drm/drm_gem.h
>>>>>>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>>>>>>          *
>>>>>>>>          */
>>>>>>>>         const struct drm_gem_object_funcs *funcs;
>>>>>>>> +
>>>>>>>> +    struct page *dummy_page;
>>>>>>>>     };
>>>>>>>>       /**
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccacccf9d68c34d8e80e708d8b3299c0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456338594884363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z9aTtBhBMJ8rvenRyEH7w1KpQUKJxQGaKGgoWPWqomo%3D&amp;reserved=0
>>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccacccf9d68c34d8e80e708d8b3299c0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456338594884363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z9aTtBhBMJ8rvenRyEH7w1KpQUKJxQGaKGgoWPWqomo%3D&amp;reserved=0
>>>>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-07 16:37                   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-07 16:37 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx, gregkh,
	Alexander.Deucher, yuq825, Harry.Wentland, christian.koenig,
	l.stach


On 1/7/21 11:30 AM, Daniel Vetter wrote:
> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>> device is removed.
>>>>>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>>>>>
>>>>>>> We need to find a different approach here.
>>>>>>>
>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>> so they are freed when the device is finally reaped?
>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>> this corner case, but why the linking ?
>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>> I want to avoid keeping the page in the GEM object.
>>>>>
>>>>> What we can do is to allocate a page on demand for each fault and link
>>>>> the together in the bdev instead.
>>>>>
>>>>> And when the bdev is then finally destroyed after the last application
>>>>> closed we can finally release all of them.
>>>>>
>>>>> Christian.
>>>> Hey, started to implement this and then realized that by allocating a page
>>>> for each fault indiscriminately
>>>> we will be allocating a new page for each faulting virtual address within a
>>>> VA range belonging the same BO
>>>> and this is obviously too much and not the intention. Should I instead use
>>>> let's say a hashtable with the hash
>>>> key being faulting BO address to actually keep allocating and reusing same
>>>> dummy zero page per GEM BO
>>>> (or for that matter DRM file object address for non imported BOs) ?
>>> Why do we need a hashtable? All the sw structures to track this should
>>> still be around:
>>> - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
>>>     so defensively allocate a per-bo page
>>> - otherwise allocate a per-file page
>>
>> That exactly what we have in current implementation
>>
>>
>>> Or is the idea to save the struct page * pointer? That feels a bit like
>>> over-optimizing stuff. Better to have a simple implementation first and
>>> then tune it if (and only if) any part of it becomes a problem for normal
>>> usage.
>>
>> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
>> Christian suggested to instead keep a linked list of dummy pages to be
>> allocated on demand once we hit a vm_fault. I will then also prefault the entire
>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
>> to that single dummy page.
> This strongly feels like premature optimization. If you're worried about
> the overhead on amdgpu, pay down the debt by removing one of the redundant
> pointers between gem and ttm bo structs (I think we still have some) :-)
>
> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> pointer just because" games with hashtables.
> -Daniel


Well, if you and Christian can agree on this approach and suggest maybe what 
pointer is
redundant and can be removed from GEM struct so we can use the 'credit' to add 
the dummy page
to GEM I will be happy to follow through.

P.S Hash table is off the table anyway and we are talking only about linked list 
here since by prefaulting
the entire VA range for a vmf->vma i will be avoiding redundant page faults to 
same VMA VA range and so
don't need to search and reuse an existing dummy page but simply create a new 
one for each next fault.

Andrey


>
>> Andrey
>>
>>
>>> -Daniel
>>>
>>>> Andrey
>>>>
>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>>>> index 0ac4566..ff3d39f 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>         }
>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>> +        ret = -ENOMEM;
>>>>>>>> +        goto out_prime_destroy;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         return file;
>>>>>>>>       out_prime_destroy:
>>>>>>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>             dev->driver->postclose(dev, file);
>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>> +
>>>>>>>>         drm_prime_destroy_file_private(&file->prime);
>>>>>>>>           WARN_ON(!list_empty(&file->event_list));
>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>>>>>> index 1693aa7..987b45c 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>                 dma_buf, *handle);
>>>>>>>> +
>>>>>>>> +    if (!ret) {
>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>> +            ret = -ENOMEM;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>         mutex_unlock(&file_priv->prime.lock);
>>>>>>>>         if (ret)
>>>>>>>>             goto fail;
>>>>>>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
>>>>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>>>>             dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>>>>         dma_buf = attach->dmabuf;
>>>>>>>>         dma_buf_detach(attach->dmabuf, attach);
>>>>>>>> +
>>>>>>>> +    __free_page(obj->dummy_page);
>>>>>>>> +
>>>>>>>>         /* remove the reference */
>>>>>>>>         dma_buf_put(dma_buf);
>>>>>>>>     }
>>>>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>>>>> index 716990b..2a011fc 100644
>>>>>>>> --- a/include/drm/drm_file.h
>>>>>>>> +++ b/include/drm/drm_file.h
>>>>>>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>>>>>>          */
>>>>>>>>         struct drm_prime_file_private prime;
>>>>>>>>     +    struct page *dummy_page;
>>>>>>>> +
>>>>>>>>         /* private: */
>>>>>>>>     #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>>>>         unsigned long lock_count; /* DRI1 legacy lock count */
>>>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>>>>> index 337a483..76a97a3 100644
>>>>>>>> --- a/include/drm/drm_gem.h
>>>>>>>> +++ b/include/drm/drm_gem.h
>>>>>>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>>>>>>          *
>>>>>>>>          */
>>>>>>>>         const struct drm_gem_object_funcs *funcs;
>>>>>>>> +
>>>>>>>> +    struct page *dummy_page;
>>>>>>>>     };
>>>>>>>>       /**
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccacccf9d68c34d8e80e708d8b3299c0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456338594884363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z9aTtBhBMJ8rvenRyEH7w1KpQUKJxQGaKGgoWPWqomo%3D&amp;reserved=0
>>>>>>
>>>>> _______________________________________________
>>>>> amd-gfx mailing list
>>>>> amd-gfx@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccacccf9d68c34d8e80e708d8b3299c0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456338594884363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z9aTtBhBMJ8rvenRyEH7w1KpQUKJxQGaKGgoWPWqomo%3D&amp;reserved=0
>>>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-07 16:37                   ` Andrey Grodzovsky
@ 2021-01-08 14:26                     ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-08 14:26 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: amd-gfx, daniel.vetter, dri-devel, yuq825, gregkh, Alexander.Deucher

Hey Christian, just a ping.

Andrey

On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>
> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>> device is removed.
>>>>>>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>>>>>>
>>>>>>>> We need to find a different approach here.
>>>>>>>>
>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>> this corner case, but why the linking ?
>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>
>>>>>> What we can do is to allocate a page on demand for each fault and link
>>>>>> the together in the bdev instead.
>>>>>>
>>>>>> And when the bdev is then finally destroyed after the last application
>>>>>> closed we can finally release all of them.
>>>>>>
>>>>>> Christian.
>>>>> Hey, started to implement this and then realized that by allocating a page
>>>>> for each fault indiscriminately
>>>>> we will be allocating a new page for each faulting virtual address within a
>>>>> VA range belonging the same BO
>>>>> and this is obviously too much and not the intention. Should I instead use
>>>>> let's say a hashtable with the hash
>>>>> key being faulting BO address to actually keep allocating and reusing same
>>>>> dummy zero page per GEM BO
>>>>> (or for that matter DRM file object address for non imported BOs) ?
>>>> Why do we need a hashtable? All the sw structures to track this should
>>>> still be around:
>>>> - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
>>>>     so defensively allocate a per-bo page
>>>> - otherwise allocate a per-file page
>>>
>>> That exactly what we have in current implementation
>>>
>>>
>>>> Or is the idea to save the struct page * pointer? That feels a bit like
>>>> over-optimizing stuff. Better to have a simple implementation first and
>>>> then tune it if (and only if) any part of it becomes a problem for normal
>>>> usage.
>>>
>>> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
>>> Christian suggested to instead keep a linked list of dummy pages to be
>>> allocated on demand once we hit a vm_fault. I will then also prefault the 
>>> entire
>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
>>> to that single dummy page.
>> This strongly feels like premature optimization. If you're worried about
>> the overhead on amdgpu, pay down the debt by removing one of the redundant
>> pointers between gem and ttm bo structs (I think we still have some) :-)
>>
>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>> pointer just because" games with hashtables.
>> -Daniel
>
>
> Well, if you and Christian can agree on this approach and suggest maybe what 
> pointer is
> redundant and can be removed from GEM struct so we can use the 'credit' to add 
> the dummy page
> to GEM I will be happy to follow through.
>
> P.S Hash table is off the table anyway and we are talking only about linked 
> list here since by prefaulting
> the entire VA range for a vmf->vma i will be avoiding redundant page faults to 
> same VMA VA range and so
> don't need to search and reuse an existing dummy page but simply create a new 
> one for each next fault.
>
> Andrey
>
>
>>
>>> Andrey
>>>
>>>
>>>> -Daniel
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>>>>> index 0ac4566..ff3d39f 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor 
>>>>>>>>> *minor)
>>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>>         }
>>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>>> +        ret = -ENOMEM;
>>>>>>>>> +        goto out_prime_destroy;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>>         return file;
>>>>>>>>>       out_prime_destroy:
>>>>>>>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>>             dev->driver->postclose(dev, file);
>>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>>> +
>>>>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>>>> WARN_ON(!list_empty(&file->event_list));
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>>>>>>> index 1693aa7..987b45c 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device 
>>>>>>>>> *dev,
>>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>>                 dma_buf, *handle);
>>>>>>>>> +
>>>>>>>>> +    if (!ret) {
>>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>>> +            ret = -ENOMEM;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> mutex_unlock(&file_priv->prime.lock);
>>>>>>>>>         if (ret)
>>>>>>>>>             goto fail;
>>>>>>>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
>>>>>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>>>>>             dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>>>>>         dma_buf = attach->dmabuf;
>>>>>>>>>         dma_buf_detach(attach->dmabuf, attach);
>>>>>>>>> +
>>>>>>>>> +    __free_page(obj->dummy_page);
>>>>>>>>> +
>>>>>>>>>         /* remove the reference */
>>>>>>>>>         dma_buf_put(dma_buf);
>>>>>>>>>     }
>>>>>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>>>>>> index 716990b..2a011fc 100644
>>>>>>>>> --- a/include/drm/drm_file.h
>>>>>>>>> +++ b/include/drm/drm_file.h
>>>>>>>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>>>>>>>          */
>>>>>>>>>         struct drm_prime_file_private prime;
>>>>>>>>>     +    struct page *dummy_page;
>>>>>>>>> +
>>>>>>>>>         /* private: */
>>>>>>>>>     #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>>>>>         unsigned long lock_count; /* DRI1 legacy lock count */
>>>>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>>>>>> index 337a483..76a97a3 100644
>>>>>>>>> --- a/include/drm/drm_gem.h
>>>>>>>>> +++ b/include/drm/drm_gem.h
>>>>>>>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>>>>>>>          *
>>>>>>>>>          */
>>>>>>>>>         const struct drm_gem_object_funcs *funcs;
>>>>>>>>> +
>>>>>>>>> +    struct page *dummy_page;
>>>>>>>>>     };
>>>>>>>>>       /**
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccacccf9d68c34d8e80e708d8b3299c0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456338594884363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z9aTtBhBMJ8rvenRyEH7w1KpQUKJxQGaKGgoWPWqomo%3D&amp;reserved=0 
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccacccf9d68c34d8e80e708d8b3299c0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456338594884363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z9aTtBhBMJ8rvenRyEH7w1KpQUKJxQGaKGgoWPWqomo%3D&amp;reserved=0 
>>>>>>
>>>>>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-08 14:26                     ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-08 14:26 UTC (permalink / raw)
  To: Daniel Vetter, christian.koenig
  Cc: robh, amd-gfx, daniel.vetter, dri-devel, eric, ppaalanen, yuq825,
	gregkh, Alexander.Deucher, Harry.Wentland, l.stach

Hey Christian, just a ping.

Andrey

On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>
> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>> device is removed.
>>>>>>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>>>>>>
>>>>>>>> We need to find a different approach here.
>>>>>>>>
>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>> this corner case, but why the linking ?
>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>
>>>>>> What we can do is to allocate a page on demand for each fault and link
>>>>>> the together in the bdev instead.
>>>>>>
>>>>>> And when the bdev is then finally destroyed after the last application
>>>>>> closed we can finally release all of them.
>>>>>>
>>>>>> Christian.
>>>>> Hey, started to implement this and then realized that by allocating a page
>>>>> for each fault indiscriminately
>>>>> we will be allocating a new page for each faulting virtual address within a
>>>>> VA range belonging the same BO
>>>>> and this is obviously too much and not the intention. Should I instead use
>>>>> let's say a hashtable with the hash
>>>>> key being faulting BO address to actually keep allocating and reusing same
>>>>> dummy zero page per GEM BO
>>>>> (or for that matter DRM file object address for non imported BOs) ?
>>>> Why do we need a hashtable? All the sw structures to track this should
>>>> still be around:
>>>> - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
>>>>     so defensively allocate a per-bo page
>>>> - otherwise allocate a per-file page
>>>
>>> That exactly what we have in current implementation
>>>
>>>
>>>> Or is the idea to save the struct page * pointer? That feels a bit like
>>>> over-optimizing stuff. Better to have a simple implementation first and
>>>> then tune it if (and only if) any part of it becomes a problem for normal
>>>> usage.
>>>
>>> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
>>> Christian suggested to instead keep a linked list of dummy pages to be
>>> allocated on demand once we hit a vm_fault. I will then also prefault the 
>>> entire
>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
>>> to that single dummy page.
>> This strongly feels like premature optimization. If you're worried about
>> the overhead on amdgpu, pay down the debt by removing one of the redundant
>> pointers between gem and ttm bo structs (I think we still have some) :-)
>>
>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>> pointer just because" games with hashtables.
>> -Daniel
>
>
> Well, if you and Christian can agree on this approach and suggest maybe what 
> pointer is
> redundant and can be removed from GEM struct so we can use the 'credit' to add 
> the dummy page
> to GEM I will be happy to follow through.
>
> P.S Hash table is off the table anyway and we are talking only about linked 
> list here since by prefaulting
> the entire VA range for a vmf->vma i will be avoiding redundant page faults to 
> same VMA VA range and so
> don't need to search and reuse an existing dummy page but simply create a new 
> one for each next fault.
>
> Andrey
>
>
>>
>>> Andrey
>>>
>>>
>>>> -Daniel
>>>>
>>>>> Andrey
>>>>>
>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>>>>>>>>> ---
>>>>>>>>>     drivers/gpu/drm/drm_file.c  |  8 ++++++++
>>>>>>>>>     drivers/gpu/drm/drm_prime.c | 10 ++++++++++
>>>>>>>>>     include/drm/drm_file.h      |  2 ++
>>>>>>>>>     include/drm/drm_gem.h       |  2 ++
>>>>>>>>>     4 files changed, 22 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
>>>>>>>>> index 0ac4566..ff3d39f 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_file.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_file.c
>>>>>>>>> @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor 
>>>>>>>>> *minor)
>>>>>>>>>                 goto out_prime_destroy;
>>>>>>>>>         }
>>>>>>>>>     +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>> +    if (!file->dummy_page) {
>>>>>>>>> +        ret = -ENOMEM;
>>>>>>>>> +        goto out_prime_destroy;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>>         return file;
>>>>>>>>>       out_prime_destroy:
>>>>>>>>> @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
>>>>>>>>>         if (dev->driver->postclose)
>>>>>>>>>             dev->driver->postclose(dev, file);
>>>>>>>>>     +    __free_page(file->dummy_page);
>>>>>>>>> +
>>>>>>>>> drm_prime_destroy_file_private(&file->prime);
>>>>>>>>> WARN_ON(!list_empty(&file->event_list));
>>>>>>>>> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
>>>>>>>>> index 1693aa7..987b45c 100644
>>>>>>>>> --- a/drivers/gpu/drm/drm_prime.c
>>>>>>>>> +++ b/drivers/gpu/drm/drm_prime.c
>>>>>>>>> @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device 
>>>>>>>>> *dev,
>>>>>>>>>           ret = drm_prime_add_buf_handle(&file_priv->prime,
>>>>>>>>>                 dma_buf, *handle);
>>>>>>>>> +
>>>>>>>>> +    if (!ret) {
>>>>>>>>> +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
>>>>>>>>> +        if (!obj->dummy_page)
>>>>>>>>> +            ret = -ENOMEM;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> mutex_unlock(&file_priv->prime.lock);
>>>>>>>>>         if (ret)
>>>>>>>>>             goto fail;
>>>>>>>>> @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
>>>>>>>>> drm_gem_object *obj, struct sg_table *sg)
>>>>>>>>>             dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
>>>>>>>>>         dma_buf = attach->dmabuf;
>>>>>>>>>         dma_buf_detach(attach->dmabuf, attach);
>>>>>>>>> +
>>>>>>>>> +    __free_page(obj->dummy_page);
>>>>>>>>> +
>>>>>>>>>         /* remove the reference */
>>>>>>>>>         dma_buf_put(dma_buf);
>>>>>>>>>     }
>>>>>>>>> diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
>>>>>>>>> index 716990b..2a011fc 100644
>>>>>>>>> --- a/include/drm/drm_file.h
>>>>>>>>> +++ b/include/drm/drm_file.h
>>>>>>>>> @@ -346,6 +346,8 @@ struct drm_file {
>>>>>>>>>          */
>>>>>>>>>         struct drm_prime_file_private prime;
>>>>>>>>>     +    struct page *dummy_page;
>>>>>>>>> +
>>>>>>>>>         /* private: */
>>>>>>>>>     #if IS_ENABLED(CONFIG_DRM_LEGACY)
>>>>>>>>>         unsigned long lock_count; /* DRI1 legacy lock count */
>>>>>>>>> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
>>>>>>>>> index 337a483..76a97a3 100644
>>>>>>>>> --- a/include/drm/drm_gem.h
>>>>>>>>> +++ b/include/drm/drm_gem.h
>>>>>>>>> @@ -311,6 +311,8 @@ struct drm_gem_object {
>>>>>>>>>          *
>>>>>>>>>          */
>>>>>>>>>         const struct drm_gem_object_funcs *funcs;
>>>>>>>>> +
>>>>>>>>> +    struct page *dummy_page;
>>>>>>>>>     };
>>>>>>>>>       /**
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx@lists.freedesktop.org
>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccacccf9d68c34d8e80e708d8b3299c0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456338594884363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z9aTtBhBMJ8rvenRyEH7w1KpQUKJxQGaKGgoWPWqomo%3D&amp;reserved=0 
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx@lists.freedesktop.org
>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7Ccacccf9d68c34d8e80e708d8b3299c0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456338594884363%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Z9aTtBhBMJ8rvenRyEH7w1KpQUKJxQGaKGgoWPWqomo%3D&amp;reserved=0 
>>>>>>
>>>>>>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-08 14:26                     ` Andrey Grodzovsky
@ 2021-01-08 14:33                       ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-08 14:33 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: amd-gfx, daniel.vetter, dri-devel, yuq825, gregkh, Alexander.Deucher

Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> Hey Christian, just a ping.

Was there any question for me here?

As far as I can see the best approach would still be to fill the VMA 
with a single dummy page and avoid pointers in the GEM object.

Christian.

>
> Andrey
>
> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>
>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>> device is removed.
>>>>>>>>> Uff, one page for each exported DMA-buf? That's not something 
>>>>>>>>> we can do.
>>>>>>>>>
>>>>>>>>> We need to find a different approach here.
>>>>>>>>>
>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>> this corner case, but why the linking ?
>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>
>>>>>>> What we can do is to allocate a page on demand for each fault 
>>>>>>> and link
>>>>>>> the together in the bdev instead.
>>>>>>>
>>>>>>> And when the bdev is then finally destroyed after the last 
>>>>>>> application
>>>>>>> closed we can finally release all of them.
>>>>>>>
>>>>>>> Christian.
>>>>>> Hey, started to implement this and then realized that by 
>>>>>> allocating a page
>>>>>> for each fault indiscriminately
>>>>>> we will be allocating a new page for each faulting virtual 
>>>>>> address within a
>>>>>> VA range belonging the same BO
>>>>>> and this is obviously too much and not the intention. Should I 
>>>>>> instead use
>>>>>> let's say a hashtable with the hash
>>>>>> key being faulting BO address to actually keep allocating and 
>>>>>> reusing same
>>>>>> dummy zero page per GEM BO
>>>>>> (or for that matter DRM file object address for non imported BOs) ?
>>>>> Why do we need a hashtable? All the sw structures to track this 
>>>>> should
>>>>> still be around:
>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as a 
>>>>> dma-buf,
>>>>>     so defensively allocate a per-bo page
>>>>> - otherwise allocate a per-file page
>>>>
>>>> That exactly what we have in current implementation
>>>>
>>>>
>>>>> Or is the idea to save the struct page * pointer? That feels a bit 
>>>>> like
>>>>> over-optimizing stuff. Better to have a simple implementation 
>>>>> first and
>>>>> then tune it if (and only if) any part of it becomes a problem for 
>>>>> normal
>>>>> usage.
>>>>
>>>> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
>>>> Christian suggested to instead keep a linked list of dummy pages to be
>>>> allocated on demand once we hit a vm_fault. I will then also 
>>>> prefault the entire
>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
>>>> to that single dummy page.
>>> This strongly feels like premature optimization. If you're worried 
>>> about
>>> the overhead on amdgpu, pay down the debt by removing one of the 
>>> redundant
>>> pointers between gem and ttm bo structs (I think we still have some) 
>>> :-)
>>>
>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>> pointer just because" games with hashtables.
>>> -Daniel
>>
>>
>> Well, if you and Christian can agree on this approach and suggest 
>> maybe what pointer is
>> redundant and can be removed from GEM struct so we can use the 
>> 'credit' to add the dummy page
>> to GEM I will be happy to follow through.
>>
>> P.S Hash table is off the table anyway and we are talking only about 
>> linked list here since by prefaulting
>> the entire VA range for a vmf->vma i will be avoiding redundant page 
>> faults to same VMA VA range and so
>> don't need to search and reuse an existing dummy page but simply 
>> create a new one for each next fault.
>>
>> Andrey

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-08 14:33                       ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-08 14:33 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: robh, amd-gfx, daniel.vetter, dri-devel, eric, ppaalanen, yuq825,
	gregkh, Alexander.Deucher, Harry.Wentland, l.stach

Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> Hey Christian, just a ping.

Was there any question for me here?

As far as I can see the best approach would still be to fill the VMA 
with a single dummy page and avoid pointers in the GEM object.

Christian.

>
> Andrey
>
> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>
>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>> device is removed.
>>>>>>>>> Uff, one page for each exported DMA-buf? That's not something 
>>>>>>>>> we can do.
>>>>>>>>>
>>>>>>>>> We need to find a different approach here.
>>>>>>>>>
>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>> this corner case, but why the linking ?
>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>
>>>>>>> What we can do is to allocate a page on demand for each fault 
>>>>>>> and link
>>>>>>> the together in the bdev instead.
>>>>>>>
>>>>>>> And when the bdev is then finally destroyed after the last 
>>>>>>> application
>>>>>>> closed we can finally release all of them.
>>>>>>>
>>>>>>> Christian.
>>>>>> Hey, started to implement this and then realized that by 
>>>>>> allocating a page
>>>>>> for each fault indiscriminately
>>>>>> we will be allocating a new page for each faulting virtual 
>>>>>> address within a
>>>>>> VA range belonging the same BO
>>>>>> and this is obviously too much and not the intention. Should I 
>>>>>> instead use
>>>>>> let's say a hashtable with the hash
>>>>>> key being faulting BO address to actually keep allocating and 
>>>>>> reusing same
>>>>>> dummy zero page per GEM BO
>>>>>> (or for that matter DRM file object address for non imported BOs) ?
>>>>> Why do we need a hashtable? All the sw structures to track this 
>>>>> should
>>>>> still be around:
>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as a 
>>>>> dma-buf,
>>>>>     so defensively allocate a per-bo page
>>>>> - otherwise allocate a per-file page
>>>>
>>>> That exactly what we have in current implementation
>>>>
>>>>
>>>>> Or is the idea to save the struct page * pointer? That feels a bit 
>>>>> like
>>>>> over-optimizing stuff. Better to have a simple implementation 
>>>>> first and
>>>>> then tune it if (and only if) any part of it becomes a problem for 
>>>>> normal
>>>>> usage.
>>>>
>>>> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
>>>> Christian suggested to instead keep a linked list of dummy pages to be
>>>> allocated on demand once we hit a vm_fault. I will then also 
>>>> prefault the entire
>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
>>>> to that single dummy page.
>>> This strongly feels like premature optimization. If you're worried 
>>> about
>>> the overhead on amdgpu, pay down the debt by removing one of the 
>>> redundant
>>> pointers between gem and ttm bo structs (I think we still have some) 
>>> :-)
>>>
>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>> pointer just because" games with hashtables.
>>> -Daniel
>>
>>
>> Well, if you and Christian can agree on this approach and suggest 
>> maybe what pointer is
>> redundant and can be removed from GEM struct so we can use the 
>> 'credit' to add the dummy page
>> to GEM I will be happy to follow through.
>>
>> P.S Hash table is off the table anyway and we are talking only about 
>> linked list here since by prefaulting
>> the entire VA range for a vmf->vma i will be avoiding redundant page 
>> faults to same VMA VA range and so
>> don't need to search and reuse an existing dummy page but simply 
>> create a new one for each next fault.
>>
>> Andrey

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-08 14:33                       ` Christian König
@ 2021-01-08 14:46                         ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-08 14:46 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: amd-gfx, daniel.vetter, dri-devel, yuq825, gregkh, Alexander.Deucher

Daniel had some objections to this (see bellow) and so I guess I need you both 
to agree on the approach before I proceed.

Andrey

On 1/8/21 9:33 AM, Christian König wrote:
> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>> Hey Christian, just a ping.
>
> Was there any question for me here?
>
> As far as I can see the best approach would still be to fill the VMA with a 
> single dummy page and avoid pointers in the GEM object.
>
> Christian.
>
>>
>> Andrey
>>
>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>
>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>> device is removed.
>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>>>>>>>>
>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>
>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>
>>>>>>>> What we can do is to allocate a page on demand for each fault and link
>>>>>>>> the together in the bdev instead.
>>>>>>>>
>>>>>>>> And when the bdev is then finally destroyed after the last application
>>>>>>>> closed we can finally release all of them.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>> Hey, started to implement this and then realized that by allocating a page
>>>>>>> for each fault indiscriminately
>>>>>>> we will be allocating a new page for each faulting virtual address within a
>>>>>>> VA range belonging the same BO
>>>>>>> and this is obviously too much and not the intention. Should I instead use
>>>>>>> let's say a hashtable with the hash
>>>>>>> key being faulting BO address to actually keep allocating and reusing same
>>>>>>> dummy zero page per GEM BO
>>>>>>> (or for that matter DRM file object address for non imported BOs) ?
>>>>>> Why do we need a hashtable? All the sw structures to track this should
>>>>>> still be around:
>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
>>>>>>     so defensively allocate a per-bo page
>>>>>> - otherwise allocate a per-file page
>>>>>
>>>>> That exactly what we have in current implementation
>>>>>
>>>>>
>>>>>> Or is the idea to save the struct page * pointer? That feels a bit like
>>>>>> over-optimizing stuff. Better to have a simple implementation first and
>>>>>> then tune it if (and only if) any part of it becomes a problem for normal
>>>>>> usage.
>>>>>
>>>>> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
>>>>> Christian suggested to instead keep a linked list of dummy pages to be
>>>>> allocated on demand once we hit a vm_fault. I will then also prefault the 
>>>>> entire
>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
>>>>> to that single dummy page.
>>>> This strongly feels like premature optimization. If you're worried about
>>>> the overhead on amdgpu, pay down the debt by removing one of the redundant
>>>> pointers between gem and ttm bo structs (I think we still have some) :-)
>>>>
>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>> pointer just because" games with hashtables.
>>>> -Daniel
>>>
>>>
>>> Well, if you and Christian can agree on this approach and suggest maybe what 
>>> pointer is
>>> redundant and can be removed from GEM struct so we can use the 'credit' to 
>>> add the dummy page
>>> to GEM I will be happy to follow through.
>>>
>>> P.S Hash table is off the table anyway and we are talking only about linked 
>>> list here since by prefaulting
>>> the entire VA range for a vmf->vma i will be avoiding redundant page faults 
>>> to same VMA VA range and so
>>> don't need to search and reuse an existing dummy page but simply create a 
>>> new one for each next fault.
>>>
>>> Andrey
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-08 14:46                         ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-08 14:46 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: robh, amd-gfx, daniel.vetter, dri-devel, eric, ppaalanen, yuq825,
	gregkh, Alexander.Deucher, Harry.Wentland, l.stach

Daniel had some objections to this (see bellow) and so I guess I need you both 
to agree on the approach before I proceed.

Andrey

On 1/8/21 9:33 AM, Christian König wrote:
> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>> Hey Christian, just a ping.
>
> Was there any question for me here?
>
> As far as I can see the best approach would still be to fill the VMA with a 
> single dummy page and avoid pointers in the GEM object.
>
> Christian.
>
>>
>> Andrey
>>
>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>
>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>> device is removed.
>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not something we can do.
>>>>>>>>>>
>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>
>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>
>>>>>>>> What we can do is to allocate a page on demand for each fault and link
>>>>>>>> the together in the bdev instead.
>>>>>>>>
>>>>>>>> And when the bdev is then finally destroyed after the last application
>>>>>>>> closed we can finally release all of them.
>>>>>>>>
>>>>>>>> Christian.
>>>>>>> Hey, started to implement this and then realized that by allocating a page
>>>>>>> for each fault indiscriminately
>>>>>>> we will be allocating a new page for each faulting virtual address within a
>>>>>>> VA range belonging the same BO
>>>>>>> and this is obviously too much and not the intention. Should I instead use
>>>>>>> let's say a hashtable with the hash
>>>>>>> key being faulting BO address to actually keep allocating and reusing same
>>>>>>> dummy zero page per GEM BO
>>>>>>> (or for that matter DRM file object address for non imported BOs) ?
>>>>>> Why do we need a hashtable? All the sw structures to track this should
>>>>>> still be around:
>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
>>>>>>     so defensively allocate a per-bo page
>>>>>> - otherwise allocate a per-file page
>>>>>
>>>>> That exactly what we have in current implementation
>>>>>
>>>>>
>>>>>> Or is the idea to save the struct page * pointer? That feels a bit like
>>>>>> over-optimizing stuff. Better to have a simple implementation first and
>>>>>> then tune it if (and only if) any part of it becomes a problem for normal
>>>>>> usage.
>>>>>
>>>>> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
>>>>> Christian suggested to instead keep a linked list of dummy pages to be
>>>>> allocated on demand once we hit a vm_fault. I will then also prefault the 
>>>>> entire
>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
>>>>> to that single dummy page.
>>>> This strongly feels like premature optimization. If you're worried about
>>>> the overhead on amdgpu, pay down the debt by removing one of the redundant
>>>> pointers between gem and ttm bo structs (I think we still have some) :-)
>>>>
>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>> pointer just because" games with hashtables.
>>>> -Daniel
>>>
>>>
>>> Well, if you and Christian can agree on this approach and suggest maybe what 
>>> pointer is
>>> redundant and can be removed from GEM struct so we can use the 'credit' to 
>>> add the dummy page
>>> to GEM I will be happy to follow through.
>>>
>>> P.S Hash table is off the table anyway and we are talking only about linked 
>>> list here since by prefaulting
>>> the entire VA range for a vmf->vma i will be avoiding redundant page faults 
>>> to same VMA VA range and so
>>> don't need to search and reuse an existing dummy page but simply create a 
>>> new one for each next fault.
>>>
>>> Andrey
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-08 14:46                         ` Andrey Grodzovsky
@ 2021-01-08 14:52                           ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-08 14:52 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: amd-gfx, daniel.vetter, dri-devel, yuq825, gregkh, Alexander.Deucher

Mhm, I'm not aware of any let over pointer between TTM and GEM and we 
worked quite hard on reducing the size of the amdgpu_bo, so another 
extra pointer just for that corner case would suck quite a bit.

Christian.

Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> Daniel had some objections to this (see bellow) and so I guess I need 
> you both to agree on the approach before I proceed.
>
> Andrey
>
> On 1/8/21 9:33 AM, Christian König wrote:
>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>> Hey Christian, just a ping.
>>
>> Was there any question for me here?
>>
>> As far as I can see the best approach would still be to fill the VMA 
>> with a single dummy page and avoid pointers in the GEM object.
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>
>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>> device is removed.
>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not 
>>>>>>>>>>> something we can do.
>>>>>>>>>>>
>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>
>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>
>>>>>>>>> What we can do is to allocate a page on demand for each fault 
>>>>>>>>> and link
>>>>>>>>> the together in the bdev instead.
>>>>>>>>>
>>>>>>>>> And when the bdev is then finally destroyed after the last 
>>>>>>>>> application
>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>> Hey, started to implement this and then realized that by 
>>>>>>>> allocating a page
>>>>>>>> for each fault indiscriminately
>>>>>>>> we will be allocating a new page for each faulting virtual 
>>>>>>>> address within a
>>>>>>>> VA range belonging the same BO
>>>>>>>> and this is obviously too much and not the intention. Should I 
>>>>>>>> instead use
>>>>>>>> let's say a hashtable with the hash
>>>>>>>> key being faulting BO address to actually keep allocating and 
>>>>>>>> reusing same
>>>>>>>> dummy zero page per GEM BO
>>>>>>>> (or for that matter DRM file object address for non imported 
>>>>>>>> BOs) ?
>>>>>>> Why do we need a hashtable? All the sw structures to track this 
>>>>>>> should
>>>>>>> still be around:
>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as 
>>>>>>> a dma-buf,
>>>>>>>     so defensively allocate a per-bo page
>>>>>>> - otherwise allocate a per-file page
>>>>>>
>>>>>> That exactly what we have in current implementation
>>>>>>
>>>>>>
>>>>>>> Or is the idea to save the struct page * pointer? That feels a 
>>>>>>> bit like
>>>>>>> over-optimizing stuff. Better to have a simple implementation 
>>>>>>> first and
>>>>>>> then tune it if (and only if) any part of it becomes a problem 
>>>>>>> for normal
>>>>>>> usage.
>>>>>>
>>>>>> Exactly - the idea is to avoid adding extra pointer to 
>>>>>> drm_gem_object,
>>>>>> Christian suggested to instead keep a linked list of dummy pages 
>>>>>> to be
>>>>>> allocated on demand once we hit a vm_fault. I will then also 
>>>>>> prefault the entire
>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map 
>>>>>> them
>>>>>> to that single dummy page.
>>>>> This strongly feels like premature optimization. If you're worried 
>>>>> about
>>>>> the overhead on amdgpu, pay down the debt by removing one of the 
>>>>> redundant
>>>>> pointers between gem and ttm bo structs (I think we still have 
>>>>> some) :-)
>>>>>
>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>> pointer just because" games with hashtables.
>>>>> -Daniel
>>>>
>>>>
>>>> Well, if you and Christian can agree on this approach and suggest 
>>>> maybe what pointer is
>>>> redundant and can be removed from GEM struct so we can use the 
>>>> 'credit' to add the dummy page
>>>> to GEM I will be happy to follow through.
>>>>
>>>> P.S Hash table is off the table anyway and we are talking only 
>>>> about linked list here since by prefaulting
>>>> the entire VA range for a vmf->vma i will be avoiding redundant 
>>>> page faults to same VMA VA range and so
>>>> don't need to search and reuse an existing dummy page but simply 
>>>> create a new one for each next fault.
>>>>
>>>> Andrey
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-08 14:52                           ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-08 14:52 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: robh, amd-gfx, daniel.vetter, dri-devel, eric, ppaalanen, yuq825,
	gregkh, Alexander.Deucher, Harry.Wentland, l.stach

Mhm, I'm not aware of any let over pointer between TTM and GEM and we 
worked quite hard on reducing the size of the amdgpu_bo, so another 
extra pointer just for that corner case would suck quite a bit.

Christian.

Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> Daniel had some objections to this (see bellow) and so I guess I need 
> you both to agree on the approach before I proceed.
>
> Andrey
>
> On 1/8/21 9:33 AM, Christian König wrote:
>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>> Hey Christian, just a ping.
>>
>> Was there any question for me here?
>>
>> As far as I can see the best approach would still be to fill the VMA 
>> with a single dummy page and avoid pointers in the GEM object.
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>
>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>> device is removed.
>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not 
>>>>>>>>>>> something we can do.
>>>>>>>>>>>
>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>
>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>
>>>>>>>>> What we can do is to allocate a page on demand for each fault 
>>>>>>>>> and link
>>>>>>>>> the together in the bdev instead.
>>>>>>>>>
>>>>>>>>> And when the bdev is then finally destroyed after the last 
>>>>>>>>> application
>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>> Hey, started to implement this and then realized that by 
>>>>>>>> allocating a page
>>>>>>>> for each fault indiscriminately
>>>>>>>> we will be allocating a new page for each faulting virtual 
>>>>>>>> address within a
>>>>>>>> VA range belonging the same BO
>>>>>>>> and this is obviously too much and not the intention. Should I 
>>>>>>>> instead use
>>>>>>>> let's say a hashtable with the hash
>>>>>>>> key being faulting BO address to actually keep allocating and 
>>>>>>>> reusing same
>>>>>>>> dummy zero page per GEM BO
>>>>>>>> (or for that matter DRM file object address for non imported 
>>>>>>>> BOs) ?
>>>>>>> Why do we need a hashtable? All the sw structures to track this 
>>>>>>> should
>>>>>>> still be around:
>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as 
>>>>>>> a dma-buf,
>>>>>>>     so defensively allocate a per-bo page
>>>>>>> - otherwise allocate a per-file page
>>>>>>
>>>>>> That exactly what we have in current implementation
>>>>>>
>>>>>>
>>>>>>> Or is the idea to save the struct page * pointer? That feels a 
>>>>>>> bit like
>>>>>>> over-optimizing stuff. Better to have a simple implementation 
>>>>>>> first and
>>>>>>> then tune it if (and only if) any part of it becomes a problem 
>>>>>>> for normal
>>>>>>> usage.
>>>>>>
>>>>>> Exactly - the idea is to avoid adding extra pointer to 
>>>>>> drm_gem_object,
>>>>>> Christian suggested to instead keep a linked list of dummy pages 
>>>>>> to be
>>>>>> allocated on demand once we hit a vm_fault. I will then also 
>>>>>> prefault the entire
>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map 
>>>>>> them
>>>>>> to that single dummy page.
>>>>> This strongly feels like premature optimization. If you're worried 
>>>>> about
>>>>> the overhead on amdgpu, pay down the debt by removing one of the 
>>>>> redundant
>>>>> pointers between gem and ttm bo structs (I think we still have 
>>>>> some) :-)
>>>>>
>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>> pointer just because" games with hashtables.
>>>>> -Daniel
>>>>
>>>>
>>>> Well, if you and Christian can agree on this approach and suggest 
>>>> maybe what pointer is
>>>> redundant and can be removed from GEM struct so we can use the 
>>>> 'credit' to add the dummy page
>>>> to GEM I will be happy to follow through.
>>>>
>>>> P.S Hash table is off the table anyway and we are talking only 
>>>> about linked list here since by prefaulting
>>>> the entire VA range for a vmf->vma i will be avoiding redundant 
>>>> page faults to same VMA VA range and so
>>>> don't need to search and reuse an existing dummy page but simply 
>>>> create a new one for each next fault.
>>>>
>>>> Andrey
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-08 14:52                           ` Christian König
@ 2021-01-08 16:49                             ` Grodzovsky, Andrey
  -1 siblings, 0 replies; 212+ messages in thread
From: Grodzovsky, Andrey @ 2021-01-08 16:49 UTC (permalink / raw)
  To: Koenig, Christian, Daniel Vetter
  Cc: amd-gfx, daniel.vetter, dri-devel, yuq825, gregkh, Deucher, Alexander


[-- Attachment #1.1: Type: text/plain, Size: 6015 bytes --]

Ok then, I guess I will proceed with the dummy pages list implementation then.

Andrey

________________________________
From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: 08 January 2021 09:52
To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object

Mhm, I'm not aware of any let over pointer between TTM and GEM and we
worked quite hard on reducing the size of the amdgpu_bo, so another
extra pointer just for that corner case would suck quite a bit.

Christian.

Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> Daniel had some objections to this (see bellow) and so I guess I need
> you both to agree on the approach before I proceed.
>
> Andrey
>
> On 1/8/21 9:33 AM, Christian König wrote:
>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>> Hey Christian, just a ping.
>>
>> Was there any question for me here?
>>
>> As far as I can see the best approach would still be to fill the VMA
>> with a single dummy page and avoid pointers in the GEM object.
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>
>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>> device is removed.
>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
>>>>>>>>>>> something we can do.
>>>>>>>>>>>
>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>
>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>
>>>>>>>>> What we can do is to allocate a page on demand for each fault
>>>>>>>>> and link
>>>>>>>>> the together in the bdev instead.
>>>>>>>>>
>>>>>>>>> And when the bdev is then finally destroyed after the last
>>>>>>>>> application
>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>> Hey, started to implement this and then realized that by
>>>>>>>> allocating a page
>>>>>>>> for each fault indiscriminately
>>>>>>>> we will be allocating a new page for each faulting virtual
>>>>>>>> address within a
>>>>>>>> VA range belonging the same BO
>>>>>>>> and this is obviously too much and not the intention. Should I
>>>>>>>> instead use
>>>>>>>> let's say a hashtable with the hash
>>>>>>>> key being faulting BO address to actually keep allocating and
>>>>>>>> reusing same
>>>>>>>> dummy zero page per GEM BO
>>>>>>>> (or for that matter DRM file object address for non imported
>>>>>>>> BOs) ?
>>>>>>> Why do we need a hashtable? All the sw structures to track this
>>>>>>> should
>>>>>>> still be around:
>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
>>>>>>> a dma-buf,
>>>>>>>     so defensively allocate a per-bo page
>>>>>>> - otherwise allocate a per-file page
>>>>>>
>>>>>> That exactly what we have in current implementation
>>>>>>
>>>>>>
>>>>>>> Or is the idea to save the struct page * pointer? That feels a
>>>>>>> bit like
>>>>>>> over-optimizing stuff. Better to have a simple implementation
>>>>>>> first and
>>>>>>> then tune it if (and only if) any part of it becomes a problem
>>>>>>> for normal
>>>>>>> usage.
>>>>>>
>>>>>> Exactly - the idea is to avoid adding extra pointer to
>>>>>> drm_gem_object,
>>>>>> Christian suggested to instead keep a linked list of dummy pages
>>>>>> to be
>>>>>> allocated on demand once we hit a vm_fault. I will then also
>>>>>> prefault the entire
>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
>>>>>> them
>>>>>> to that single dummy page.
>>>>> This strongly feels like premature optimization. If you're worried
>>>>> about
>>>>> the overhead on amdgpu, pay down the debt by removing one of the
>>>>> redundant
>>>>> pointers between gem and ttm bo structs (I think we still have
>>>>> some) :-)
>>>>>
>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>> pointer just because" games with hashtables.
>>>>> -Daniel
>>>>
>>>>
>>>> Well, if you and Christian can agree on this approach and suggest
>>>> maybe what pointer is
>>>> redundant and can be removed from GEM struct so we can use the
>>>> 'credit' to add the dummy page
>>>> to GEM I will be happy to follow through.
>>>>
>>>> P.S Hash table is off the table anyway and we are talking only
>>>> about linked list here since by prefaulting
>>>> the entire VA range for a vmf->vma i will be avoiding redundant
>>>> page faults to same VMA VA range and so
>>>> don't need to search and reuse an existing dummy page but simply
>>>> create a new one for each next fault.
>>>>
>>>> Andrey
>>


[-- Attachment #1.2: Type: text/html, Size: 9885 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-08 16:49                             ` Grodzovsky, Andrey
  0 siblings, 0 replies; 212+ messages in thread
From: Grodzovsky, Andrey @ 2021-01-08 16:49 UTC (permalink / raw)
  To: Koenig, Christian, Daniel Vetter
  Cc: robh, amd-gfx, daniel.vetter, dri-devel, eric, ppaalanen, yuq825,
	gregkh, Deucher, Alexander, Wentland, Harry, l.stach


[-- Attachment #1.1: Type: text/plain, Size: 6015 bytes --]

Ok then, I guess I will proceed with the dummy pages list implementation then.

Andrey

________________________________
From: Koenig, Christian <Christian.Koenig@amd.com>
Sent: 08 January 2021 09:52
To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object

Mhm, I'm not aware of any let over pointer between TTM and GEM and we
worked quite hard on reducing the size of the amdgpu_bo, so another
extra pointer just for that corner case would suck quite a bit.

Christian.

Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> Daniel had some objections to this (see bellow) and so I guess I need
> you both to agree on the approach before I proceed.
>
> Andrey
>
> On 1/8/21 9:33 AM, Christian König wrote:
>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>> Hey Christian, just a ping.
>>
>> Was there any question for me here?
>>
>> As far as I can see the best approach would still be to fill the VMA
>> with a single dummy page and avoid pointers in the GEM object.
>>
>> Christian.
>>
>>>
>>> Andrey
>>>
>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>
>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>> device is removed.
>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
>>>>>>>>>>> something we can do.
>>>>>>>>>>>
>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>
>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>
>>>>>>>>> What we can do is to allocate a page on demand for each fault
>>>>>>>>> and link
>>>>>>>>> the together in the bdev instead.
>>>>>>>>>
>>>>>>>>> And when the bdev is then finally destroyed after the last
>>>>>>>>> application
>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>
>>>>>>>>> Christian.
>>>>>>>> Hey, started to implement this and then realized that by
>>>>>>>> allocating a page
>>>>>>>> for each fault indiscriminately
>>>>>>>> we will be allocating a new page for each faulting virtual
>>>>>>>> address within a
>>>>>>>> VA range belonging the same BO
>>>>>>>> and this is obviously too much and not the intention. Should I
>>>>>>>> instead use
>>>>>>>> let's say a hashtable with the hash
>>>>>>>> key being faulting BO address to actually keep allocating and
>>>>>>>> reusing same
>>>>>>>> dummy zero page per GEM BO
>>>>>>>> (or for that matter DRM file object address for non imported
>>>>>>>> BOs) ?
>>>>>>> Why do we need a hashtable? All the sw structures to track this
>>>>>>> should
>>>>>>> still be around:
>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
>>>>>>> a dma-buf,
>>>>>>>     so defensively allocate a per-bo page
>>>>>>> - otherwise allocate a per-file page
>>>>>>
>>>>>> That exactly what we have in current implementation
>>>>>>
>>>>>>
>>>>>>> Or is the idea to save the struct page * pointer? That feels a
>>>>>>> bit like
>>>>>>> over-optimizing stuff. Better to have a simple implementation
>>>>>>> first and
>>>>>>> then tune it if (and only if) any part of it becomes a problem
>>>>>>> for normal
>>>>>>> usage.
>>>>>>
>>>>>> Exactly - the idea is to avoid adding extra pointer to
>>>>>> drm_gem_object,
>>>>>> Christian suggested to instead keep a linked list of dummy pages
>>>>>> to be
>>>>>> allocated on demand once we hit a vm_fault. I will then also
>>>>>> prefault the entire
>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
>>>>>> them
>>>>>> to that single dummy page.
>>>>> This strongly feels like premature optimization. If you're worried
>>>>> about
>>>>> the overhead on amdgpu, pay down the debt by removing one of the
>>>>> redundant
>>>>> pointers between gem and ttm bo structs (I think we still have
>>>>> some) :-)
>>>>>
>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>> pointer just because" games with hashtables.
>>>>> -Daniel
>>>>
>>>>
>>>> Well, if you and Christian can agree on this approach and suggest
>>>> maybe what pointer is
>>>> redundant and can be removed from GEM struct so we can use the
>>>> 'credit' to add the dummy page
>>>> to GEM I will be happy to follow through.
>>>>
>>>> P.S Hash table is off the table anyway and we are talking only
>>>> about linked list here since by prefaulting
>>>> the entire VA range for a vmf->vma i will be avoiding redundant
>>>> page faults to same VMA VA range and so
>>>> don't need to search and reuse an existing dummy page but simply
>>>> create a new one for each next fault.
>>>>
>>>> Andrey
>>


[-- Attachment #1.2: Type: text/html, Size: 9885 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-08 16:49                             ` Grodzovsky, Andrey
@ 2021-01-11 16:13                               ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-11 16:13 UTC (permalink / raw)
  To: Grodzovsky, Andrey
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Deucher, Alexander,
	yuq825, Koenig, Christian

On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> Ok then, I guess I will proceed with the dummy pages list implementation then.
> 
> Andrey
> 
> ________________________________
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: 08 January 2021 09:52
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> 
> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> worked quite hard on reducing the size of the amdgpu_bo, so another
> extra pointer just for that corner case would suck quite a bit.

We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
things) which are fairly single-use, so I'm really not much seeing the
point in making this a special case. It also means the lifetime management
becomes a bit iffy, since we can't throw away the dummy page then the last
reference to the bo is released (since we don't track it there), but only
when the last pointer to the device is released. Potentially this means a
pile of dangling pages hanging around for too long.

If you need some ideas for redundant pointers:
- destroy callback (kinda not cool to not have this const anyway), we
  could refcount it all with the overall gem bo. Quite a bit of work.
- bdev pointer, if we move the device ttm stuff into struct drm_device, or
  create a common struct ttm_device, we can ditch that
- We could probably merge a few of the fields and find 8 bytes somewhere
- we still have 2 krefs, would probably need to fix that before we can
  merge the destroy callbacks

So there's plenty of room still, if the size of a bo struct is really that
critical. Imo it's not.
-Daniel


> 
> Christian.
> 
> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> > Daniel had some objections to this (see bellow) and so I guess I need
> > you both to agree on the approach before I proceed.
> >
> > Andrey
> >
> > On 1/8/21 9:33 AM, Christian König wrote:
> >> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> >>> Hey Christian, just a ping.
> >>
> >> Was there any question for me here?
> >>
> >> As far as I can see the best approach would still be to fill the VMA
> >> with a single dummy page and avoid pointers in the GEM object.
> >>
> >> Christian.
> >>
> >>>
> >>> Andrey
> >>>
> >>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> >>>>
> >>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
> >>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> >>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> >>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> >>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
> >>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> >>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
> >>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> >>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
> >>>>>>>>>>>> device is removed.
> >>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
> >>>>>>>>>>> something we can do.
> >>>>>>>>>>>
> >>>>>>>>>>> We need to find a different approach here.
> >>>>>>>>>>>
> >>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
> >>>>>>>>>>> so they are freed when the device is finally reaped?
> >>>>>>>>>> For sure better to optimize and allocate on demand when we reach
> >>>>>>>>>> this corner case, but why the linking ?
> >>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
> >>>>>>>>> I want to avoid keeping the page in the GEM object.
> >>>>>>>>>
> >>>>>>>>> What we can do is to allocate a page on demand for each fault
> >>>>>>>>> and link
> >>>>>>>>> the together in the bdev instead.
> >>>>>>>>>
> >>>>>>>>> And when the bdev is then finally destroyed after the last
> >>>>>>>>> application
> >>>>>>>>> closed we can finally release all of them.
> >>>>>>>>>
> >>>>>>>>> Christian.
> >>>>>>>> Hey, started to implement this and then realized that by
> >>>>>>>> allocating a page
> >>>>>>>> for each fault indiscriminately
> >>>>>>>> we will be allocating a new page for each faulting virtual
> >>>>>>>> address within a
> >>>>>>>> VA range belonging the same BO
> >>>>>>>> and this is obviously too much and not the intention. Should I
> >>>>>>>> instead use
> >>>>>>>> let's say a hashtable with the hash
> >>>>>>>> key being faulting BO address to actually keep allocating and
> >>>>>>>> reusing same
> >>>>>>>> dummy zero page per GEM BO
> >>>>>>>> (or for that matter DRM file object address for non imported
> >>>>>>>> BOs) ?
> >>>>>>> Why do we need a hashtable? All the sw structures to track this
> >>>>>>> should
> >>>>>>> still be around:
> >>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
> >>>>>>> a dma-buf,
> >>>>>>>     so defensively allocate a per-bo page
> >>>>>>> - otherwise allocate a per-file page
> >>>>>>
> >>>>>> That exactly what we have in current implementation
> >>>>>>
> >>>>>>
> >>>>>>> Or is the idea to save the struct page * pointer? That feels a
> >>>>>>> bit like
> >>>>>>> over-optimizing stuff. Better to have a simple implementation
> >>>>>>> first and
> >>>>>>> then tune it if (and only if) any part of it becomes a problem
> >>>>>>> for normal
> >>>>>>> usage.
> >>>>>>
> >>>>>> Exactly - the idea is to avoid adding extra pointer to
> >>>>>> drm_gem_object,
> >>>>>> Christian suggested to instead keep a linked list of dummy pages
> >>>>>> to be
> >>>>>> allocated on demand once we hit a vm_fault. I will then also
> >>>>>> prefault the entire
> >>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> >>>>>> them
> >>>>>> to that single dummy page.
> >>>>> This strongly feels like premature optimization. If you're worried
> >>>>> about
> >>>>> the overhead on amdgpu, pay down the debt by removing one of the
> >>>>> redundant
> >>>>> pointers between gem and ttm bo structs (I think we still have
> >>>>> some) :-)
> >>>>>
> >>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> >>>>> pointer just because" games with hashtables.
> >>>>> -Daniel
> >>>>
> >>>>
> >>>> Well, if you and Christian can agree on this approach and suggest
> >>>> maybe what pointer is
> >>>> redundant and can be removed from GEM struct so we can use the
> >>>> 'credit' to add the dummy page
> >>>> to GEM I will be happy to follow through.
> >>>>
> >>>> P.S Hash table is off the table anyway and we are talking only
> >>>> about linked list here since by prefaulting
> >>>> the entire VA range for a vmf->vma i will be avoiding redundant
> >>>> page faults to same VMA VA range and so
> >>>> don't need to search and reuse an existing dummy page but simply
> >>>> create a new one for each next fault.
> >>>>
> >>>> Andrey
> >>
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-11 16:13                               ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-11 16:13 UTC (permalink / raw)
  To: Grodzovsky, Andrey
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx,
	Daniel Vetter, gregkh, Deucher, Alexander, yuq825, Wentland,
	Harry, Koenig, Christian, l.stach

On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> Ok then, I guess I will proceed with the dummy pages list implementation then.
> 
> Andrey
> 
> ________________________________
> From: Koenig, Christian <Christian.Koenig@amd.com>
> Sent: 08 January 2021 09:52
> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> 
> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> worked quite hard on reducing the size of the amdgpu_bo, so another
> extra pointer just for that corner case would suck quite a bit.

We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
things) which are fairly single-use, so I'm really not much seeing the
point in making this a special case. It also means the lifetime management
becomes a bit iffy, since we can't throw away the dummy page then the last
reference to the bo is released (since we don't track it there), but only
when the last pointer to the device is released. Potentially this means a
pile of dangling pages hanging around for too long.

If you need some ideas for redundant pointers:
- destroy callback (kinda not cool to not have this const anyway), we
  could refcount it all with the overall gem bo. Quite a bit of work.
- bdev pointer, if we move the device ttm stuff into struct drm_device, or
  create a common struct ttm_device, we can ditch that
- We could probably merge a few of the fields and find 8 bytes somewhere
- we still have 2 krefs, would probably need to fix that before we can
  merge the destroy callbacks

So there's plenty of room still, if the size of a bo struct is really that
critical. Imo it's not.
-Daniel


> 
> Christian.
> 
> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> > Daniel had some objections to this (see bellow) and so I guess I need
> > you both to agree on the approach before I proceed.
> >
> > Andrey
> >
> > On 1/8/21 9:33 AM, Christian König wrote:
> >> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> >>> Hey Christian, just a ping.
> >>
> >> Was there any question for me here?
> >>
> >> As far as I can see the best approach would still be to fill the VMA
> >> with a single dummy page and avoid pointers in the GEM object.
> >>
> >> Christian.
> >>
> >>>
> >>> Andrey
> >>>
> >>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> >>>>
> >>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
> >>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> >>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> >>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> >>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
> >>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> >>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
> >>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> >>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
> >>>>>>>>>>>> device is removed.
> >>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
> >>>>>>>>>>> something we can do.
> >>>>>>>>>>>
> >>>>>>>>>>> We need to find a different approach here.
> >>>>>>>>>>>
> >>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
> >>>>>>>>>>> so they are freed when the device is finally reaped?
> >>>>>>>>>> For sure better to optimize and allocate on demand when we reach
> >>>>>>>>>> this corner case, but why the linking ?
> >>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
> >>>>>>>>> I want to avoid keeping the page in the GEM object.
> >>>>>>>>>
> >>>>>>>>> What we can do is to allocate a page on demand for each fault
> >>>>>>>>> and link
> >>>>>>>>> the together in the bdev instead.
> >>>>>>>>>
> >>>>>>>>> And when the bdev is then finally destroyed after the last
> >>>>>>>>> application
> >>>>>>>>> closed we can finally release all of them.
> >>>>>>>>>
> >>>>>>>>> Christian.
> >>>>>>>> Hey, started to implement this and then realized that by
> >>>>>>>> allocating a page
> >>>>>>>> for each fault indiscriminately
> >>>>>>>> we will be allocating a new page for each faulting virtual
> >>>>>>>> address within a
> >>>>>>>> VA range belonging the same BO
> >>>>>>>> and this is obviously too much and not the intention. Should I
> >>>>>>>> instead use
> >>>>>>>> let's say a hashtable with the hash
> >>>>>>>> key being faulting BO address to actually keep allocating and
> >>>>>>>> reusing same
> >>>>>>>> dummy zero page per GEM BO
> >>>>>>>> (or for that matter DRM file object address for non imported
> >>>>>>>> BOs) ?
> >>>>>>> Why do we need a hashtable? All the sw structures to track this
> >>>>>>> should
> >>>>>>> still be around:
> >>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
> >>>>>>> a dma-buf,
> >>>>>>>     so defensively allocate a per-bo page
> >>>>>>> - otherwise allocate a per-file page
> >>>>>>
> >>>>>> That exactly what we have in current implementation
> >>>>>>
> >>>>>>
> >>>>>>> Or is the idea to save the struct page * pointer? That feels a
> >>>>>>> bit like
> >>>>>>> over-optimizing stuff. Better to have a simple implementation
> >>>>>>> first and
> >>>>>>> then tune it if (and only if) any part of it becomes a problem
> >>>>>>> for normal
> >>>>>>> usage.
> >>>>>>
> >>>>>> Exactly - the idea is to avoid adding extra pointer to
> >>>>>> drm_gem_object,
> >>>>>> Christian suggested to instead keep a linked list of dummy pages
> >>>>>> to be
> >>>>>> allocated on demand once we hit a vm_fault. I will then also
> >>>>>> prefault the entire
> >>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> >>>>>> them
> >>>>>> to that single dummy page.
> >>>>> This strongly feels like premature optimization. If you're worried
> >>>>> about
> >>>>> the overhead on amdgpu, pay down the debt by removing one of the
> >>>>> redundant
> >>>>> pointers between gem and ttm bo structs (I think we still have
> >>>>> some) :-)
> >>>>>
> >>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> >>>>> pointer just because" games with hashtables.
> >>>>> -Daniel
> >>>>
> >>>>
> >>>> Well, if you and Christian can agree on this approach and suggest
> >>>> maybe what pointer is
> >>>> redundant and can be removed from GEM struct so we can use the
> >>>> 'credit' to add the dummy page
> >>>> to GEM I will be happy to follow through.
> >>>>
> >>>> P.S Hash table is off the table anyway and we are talking only
> >>>> about linked list here since by prefaulting
> >>>> the entire VA range for a vmf->vma i will be avoiding redundant
> >>>> page faults to same VMA VA range and so
> >>>> don't need to search and reuse an existing dummy page but simply
> >>>> create a new one for each next fault.
> >>>>
> >>>> Andrey
> >>
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-11 16:13                               ` Daniel Vetter
@ 2021-01-11 16:15                                 ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-11 16:15 UTC (permalink / raw)
  To: Grodzovsky, Andrey
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Deucher, Alexander,
	yuq825, Koenig, Christian

On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> > Ok then, I guess I will proceed with the dummy pages list implementation then.
> > 
> > Andrey
> > 
> > ________________________________
> > From: Koenig, Christian <Christian.Koenig@amd.com>
> > Sent: 08 January 2021 09:52
> > To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> > Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
> > Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> > 
> > Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> > worked quite hard on reducing the size of the amdgpu_bo, so another
> > extra pointer just for that corner case would suck quite a bit.
> 
> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> things) which are fairly single-use, so I'm really not much seeing the
> point in making this a special case. It also means the lifetime management
> becomes a bit iffy, since we can't throw away the dummy page then the last
> reference to the bo is released (since we don't track it there), but only
> when the last pointer to the device is released. Potentially this means a
> pile of dangling pages hanging around for too long.

Also if you really, really, really want to have this list, please don't
reinvent it since we have it already. drmm_ is exactly meant for resources
that should be freed when the final drm_device reference disappears.
-Daniel
 
> If you need some ideas for redundant pointers:
> - destroy callback (kinda not cool to not have this const anyway), we
>   could refcount it all with the overall gem bo. Quite a bit of work.
> - bdev pointer, if we move the device ttm stuff into struct drm_device, or
>   create a common struct ttm_device, we can ditch that
> - We could probably merge a few of the fields and find 8 bytes somewhere
> - we still have 2 krefs, would probably need to fix that before we can
>   merge the destroy callbacks
> 
> So there's plenty of room still, if the size of a bo struct is really that
> critical. Imo it's not.
> 
> 
> > 
> > Christian.
> > 
> > Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> > > Daniel had some objections to this (see bellow) and so I guess I need
> > > you both to agree on the approach before I proceed.
> > >
> > > Andrey
> > >
> > > On 1/8/21 9:33 AM, Christian König wrote:
> > >> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> > >>> Hey Christian, just a ping.
> > >>
> > >> Was there any question for me here?
> > >>
> > >> As far as I can see the best approach would still be to fill the VMA
> > >> with a single dummy page and avoid pointers in the GEM object.
> > >>
> > >> Christian.
> > >>
> > >>>
> > >>> Andrey
> > >>>
> > >>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> > >>>>
> > >>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
> > >>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> > >>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> > >>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> > >>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
> > >>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > >>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
> > >>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > >>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
> > >>>>>>>>>>>> device is removed.
> > >>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
> > >>>>>>>>>>> something we can do.
> > >>>>>>>>>>>
> > >>>>>>>>>>> We need to find a different approach here.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
> > >>>>>>>>>>> so they are freed when the device is finally reaped?
> > >>>>>>>>>> For sure better to optimize and allocate on demand when we reach
> > >>>>>>>>>> this corner case, but why the linking ?
> > >>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > >>>>>>>>> I want to avoid keeping the page in the GEM object.
> > >>>>>>>>>
> > >>>>>>>>> What we can do is to allocate a page on demand for each fault
> > >>>>>>>>> and link
> > >>>>>>>>> the together in the bdev instead.
> > >>>>>>>>>
> > >>>>>>>>> And when the bdev is then finally destroyed after the last
> > >>>>>>>>> application
> > >>>>>>>>> closed we can finally release all of them.
> > >>>>>>>>>
> > >>>>>>>>> Christian.
> > >>>>>>>> Hey, started to implement this and then realized that by
> > >>>>>>>> allocating a page
> > >>>>>>>> for each fault indiscriminately
> > >>>>>>>> we will be allocating a new page for each faulting virtual
> > >>>>>>>> address within a
> > >>>>>>>> VA range belonging the same BO
> > >>>>>>>> and this is obviously too much and not the intention. Should I
> > >>>>>>>> instead use
> > >>>>>>>> let's say a hashtable with the hash
> > >>>>>>>> key being faulting BO address to actually keep allocating and
> > >>>>>>>> reusing same
> > >>>>>>>> dummy zero page per GEM BO
> > >>>>>>>> (or for that matter DRM file object address for non imported
> > >>>>>>>> BOs) ?
> > >>>>>>> Why do we need a hashtable? All the sw structures to track this
> > >>>>>>> should
> > >>>>>>> still be around:
> > >>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
> > >>>>>>> a dma-buf,
> > >>>>>>>     so defensively allocate a per-bo page
> > >>>>>>> - otherwise allocate a per-file page
> > >>>>>>
> > >>>>>> That exactly what we have in current implementation
> > >>>>>>
> > >>>>>>
> > >>>>>>> Or is the idea to save the struct page * pointer? That feels a
> > >>>>>>> bit like
> > >>>>>>> over-optimizing stuff. Better to have a simple implementation
> > >>>>>>> first and
> > >>>>>>> then tune it if (and only if) any part of it becomes a problem
> > >>>>>>> for normal
> > >>>>>>> usage.
> > >>>>>>
> > >>>>>> Exactly - the idea is to avoid adding extra pointer to
> > >>>>>> drm_gem_object,
> > >>>>>> Christian suggested to instead keep a linked list of dummy pages
> > >>>>>> to be
> > >>>>>> allocated on demand once we hit a vm_fault. I will then also
> > >>>>>> prefault the entire
> > >>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> > >>>>>> them
> > >>>>>> to that single dummy page.
> > >>>>> This strongly feels like premature optimization. If you're worried
> > >>>>> about
> > >>>>> the overhead on amdgpu, pay down the debt by removing one of the
> > >>>>> redundant
> > >>>>> pointers between gem and ttm bo structs (I think we still have
> > >>>>> some) :-)
> > >>>>>
> > >>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> > >>>>> pointer just because" games with hashtables.
> > >>>>> -Daniel
> > >>>>
> > >>>>
> > >>>> Well, if you and Christian can agree on this approach and suggest
> > >>>> maybe what pointer is
> > >>>> redundant and can be removed from GEM struct so we can use the
> > >>>> 'credit' to add the dummy page
> > >>>> to GEM I will be happy to follow through.
> > >>>>
> > >>>> P.S Hash table is off the table anyway and we are talking only
> > >>>> about linked list here since by prefaulting
> > >>>> the entire VA range for a vmf->vma i will be avoiding redundant
> > >>>> page faults to same VMA VA range and so
> > >>>> don't need to search and reuse an existing dummy page but simply
> > >>>> create a new one for each next fault.
> > >>>>
> > >>>> Andrey
> > >>
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-11 16:15                                 ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-11 16:15 UTC (permalink / raw)
  To: Grodzovsky, Andrey
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx,
	Daniel Vetter, gregkh, Deucher, Alexander, yuq825, Wentland,
	Harry, Koenig, Christian, l.stach

On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> > Ok then, I guess I will proceed with the dummy pages list implementation then.
> > 
> > Andrey
> > 
> > ________________________________
> > From: Koenig, Christian <Christian.Koenig@amd.com>
> > Sent: 08 January 2021 09:52
> > To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> > Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
> > Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> > 
> > Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> > worked quite hard on reducing the size of the amdgpu_bo, so another
> > extra pointer just for that corner case would suck quite a bit.
> 
> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> things) which are fairly single-use, so I'm really not much seeing the
> point in making this a special case. It also means the lifetime management
> becomes a bit iffy, since we can't throw away the dummy page then the last
> reference to the bo is released (since we don't track it there), but only
> when the last pointer to the device is released. Potentially this means a
> pile of dangling pages hanging around for too long.

Also if you really, really, really want to have this list, please don't
reinvent it since we have it already. drmm_ is exactly meant for resources
that should be freed when the final drm_device reference disappears.
-Daniel
 
> If you need some ideas for redundant pointers:
> - destroy callback (kinda not cool to not have this const anyway), we
>   could refcount it all with the overall gem bo. Quite a bit of work.
> - bdev pointer, if we move the device ttm stuff into struct drm_device, or
>   create a common struct ttm_device, we can ditch that
> - We could probably merge a few of the fields and find 8 bytes somewhere
> - we still have 2 krefs, would probably need to fix that before we can
>   merge the destroy callbacks
> 
> So there's plenty of room still, if the size of a bo struct is really that
> critical. Imo it's not.
> 
> 
> > 
> > Christian.
> > 
> > Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> > > Daniel had some objections to this (see bellow) and so I guess I need
> > > you both to agree on the approach before I proceed.
> > >
> > > Andrey
> > >
> > > On 1/8/21 9:33 AM, Christian König wrote:
> > >> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> > >>> Hey Christian, just a ping.
> > >>
> > >> Was there any question for me here?
> > >>
> > >> As far as I can see the best approach would still be to fill the VMA
> > >> with a single dummy page and avoid pointers in the GEM object.
> > >>
> > >> Christian.
> > >>
> > >>>
> > >>> Andrey
> > >>>
> > >>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> > >>>>
> > >>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
> > >>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> > >>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> > >>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> > >>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
> > >>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > >>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
> > >>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > >>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
> > >>>>>>>>>>>> device is removed.
> > >>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
> > >>>>>>>>>>> something we can do.
> > >>>>>>>>>>>
> > >>>>>>>>>>> We need to find a different approach here.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
> > >>>>>>>>>>> so they are freed when the device is finally reaped?
> > >>>>>>>>>> For sure better to optimize and allocate on demand when we reach
> > >>>>>>>>>> this corner case, but why the linking ?
> > >>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > >>>>>>>>> I want to avoid keeping the page in the GEM object.
> > >>>>>>>>>
> > >>>>>>>>> What we can do is to allocate a page on demand for each fault
> > >>>>>>>>> and link
> > >>>>>>>>> the together in the bdev instead.
> > >>>>>>>>>
> > >>>>>>>>> And when the bdev is then finally destroyed after the last
> > >>>>>>>>> application
> > >>>>>>>>> closed we can finally release all of them.
> > >>>>>>>>>
> > >>>>>>>>> Christian.
> > >>>>>>>> Hey, started to implement this and then realized that by
> > >>>>>>>> allocating a page
> > >>>>>>>> for each fault indiscriminately
> > >>>>>>>> we will be allocating a new page for each faulting virtual
> > >>>>>>>> address within a
> > >>>>>>>> VA range belonging the same BO
> > >>>>>>>> and this is obviously too much and not the intention. Should I
> > >>>>>>>> instead use
> > >>>>>>>> let's say a hashtable with the hash
> > >>>>>>>> key being faulting BO address to actually keep allocating and
> > >>>>>>>> reusing same
> > >>>>>>>> dummy zero page per GEM BO
> > >>>>>>>> (or for that matter DRM file object address for non imported
> > >>>>>>>> BOs) ?
> > >>>>>>> Why do we need a hashtable? All the sw structures to track this
> > >>>>>>> should
> > >>>>>>> still be around:
> > >>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
> > >>>>>>> a dma-buf,
> > >>>>>>>     so defensively allocate a per-bo page
> > >>>>>>> - otherwise allocate a per-file page
> > >>>>>>
> > >>>>>> That exactly what we have in current implementation
> > >>>>>>
> > >>>>>>
> > >>>>>>> Or is the idea to save the struct page * pointer? That feels a
> > >>>>>>> bit like
> > >>>>>>> over-optimizing stuff. Better to have a simple implementation
> > >>>>>>> first and
> > >>>>>>> then tune it if (and only if) any part of it becomes a problem
> > >>>>>>> for normal
> > >>>>>>> usage.
> > >>>>>>
> > >>>>>> Exactly - the idea is to avoid adding extra pointer to
> > >>>>>> drm_gem_object,
> > >>>>>> Christian suggested to instead keep a linked list of dummy pages
> > >>>>>> to be
> > >>>>>> allocated on demand once we hit a vm_fault. I will then also
> > >>>>>> prefault the entire
> > >>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> > >>>>>> them
> > >>>>>> to that single dummy page.
> > >>>>> This strongly feels like premature optimization. If you're worried
> > >>>>> about
> > >>>>> the overhead on amdgpu, pay down the debt by removing one of the
> > >>>>> redundant
> > >>>>> pointers between gem and ttm bo structs (I think we still have
> > >>>>> some) :-)
> > >>>>>
> > >>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> > >>>>> pointer just because" games with hashtables.
> > >>>>> -Daniel
> > >>>>
> > >>>>
> > >>>> Well, if you and Christian can agree on this approach and suggest
> > >>>> maybe what pointer is
> > >>>> redundant and can be removed from GEM struct so we can use the
> > >>>> 'credit' to add the dummy page
> > >>>> to GEM I will be happy to follow through.
> > >>>>
> > >>>> P.S Hash table is off the table anyway and we are talking only
> > >>>> about linked list here since by prefaulting
> > >>>> the entire VA range for a vmf->vma i will be avoiding redundant
> > >>>> page faults to same VMA VA range and so
> > >>>> don't need to search and reuse an existing dummy page but simply
> > >>>> create a new one for each next fault.
> > >>>>
> > >>>> Andrey
> > >>
> > 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-11 16:15                                 ` Daniel Vetter
@ 2021-01-11 17:41                                   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-11 17:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Deucher, Alexander,
	Koenig, Christian, yuq825


On 1/11/21 11:15 AM, Daniel Vetter wrote:
> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>> Ok then, I guess I will proceed with the dummy pages list implementation then.
>>>
>>> Andrey
>>>
>>> ________________________________
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: 08 January 2021 09:52
>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>
>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>> extra pointer just for that corner case would suck quite a bit.
>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>> things) which are fairly single-use, so I'm really not much seeing the
>> point in making this a special case. It also means the lifetime management
>> becomes a bit iffy, since we can't throw away the dummy page then the last
>> reference to the bo is released (since we don't track it there), but only
>> when the last pointer to the device is released. Potentially this means a
>> pile of dangling pages hanging around for too long.
> Also if you really, really, really want to have this list, please don't
> reinvent it since we have it already. drmm_ is exactly meant for resources
> that should be freed when the final drm_device reference disappears.
> -Daniel


Can you elaborate ? We still need to actually implement the list but you want me 
to use
drmm_add_action for it's destruction instead of explicitly doing it (like I'm 
already doing from  ttm_bo_device_release) ?

Andrey


>   
>> If you need some ideas for redundant pointers:
>> - destroy callback (kinda not cool to not have this const anyway), we
>>    could refcount it all with the overall gem bo. Quite a bit of work.
>> - bdev pointer, if we move the device ttm stuff into struct drm_device, or
>>    create a common struct ttm_device, we can ditch that
>> - We could probably merge a few of the fields and find 8 bytes somewhere
>> - we still have 2 krefs, would probably need to fix that before we can
>>    merge the destroy callbacks
>>
>> So there's plenty of room still, if the size of a bo struct is really that
>> critical. Imo it's not.
>>
>>
>>> Christian.
>>>
>>> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
>>>> Daniel had some objections to this (see bellow) and so I guess I need
>>>> you both to agree on the approach before I proceed.
>>>>
>>>> Andrey
>>>>
>>>> On 1/8/21 9:33 AM, Christian König wrote:
>>>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>>>>> Hey Christian, just a ping.
>>>>> Was there any question for me here?
>>>>>
>>>>> As far as I can see the best approach would still be to fill the VMA
>>>>> with a single dummy page and avoid pointers in the GEM object.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>>>>> device is removed.
>>>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
>>>>>>>>>>>>>> something we can do.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>>>>
>>>>>>>>>>>> What we can do is to allocate a page on demand for each fault
>>>>>>>>>>>> and link
>>>>>>>>>>>> the together in the bdev instead.
>>>>>>>>>>>>
>>>>>>>>>>>> And when the bdev is then finally destroyed after the last
>>>>>>>>>>>> application
>>>>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>> Hey, started to implement this and then realized that by
>>>>>>>>>>> allocating a page
>>>>>>>>>>> for each fault indiscriminately
>>>>>>>>>>> we will be allocating a new page for each faulting virtual
>>>>>>>>>>> address within a
>>>>>>>>>>> VA range belonging the same BO
>>>>>>>>>>> and this is obviously too much and not the intention. Should I
>>>>>>>>>>> instead use
>>>>>>>>>>> let's say a hashtable with the hash
>>>>>>>>>>> key being faulting BO address to actually keep allocating and
>>>>>>>>>>> reusing same
>>>>>>>>>>> dummy zero page per GEM BO
>>>>>>>>>>> (or for that matter DRM file object address for non imported
>>>>>>>>>>> BOs) ?
>>>>>>>>>> Why do we need a hashtable? All the sw structures to track this
>>>>>>>>>> should
>>>>>>>>>> still be around:
>>>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
>>>>>>>>>> a dma-buf,
>>>>>>>>>>      so defensively allocate a per-bo page
>>>>>>>>>> - otherwise allocate a per-file page
>>>>>>>>> That exactly what we have in current implementation
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
>>>>>>>>>> bit like
>>>>>>>>>> over-optimizing stuff. Better to have a simple implementation
>>>>>>>>>> first and
>>>>>>>>>> then tune it if (and only if) any part of it becomes a problem
>>>>>>>>>> for normal
>>>>>>>>>> usage.
>>>>>>>>> Exactly - the idea is to avoid adding extra pointer to
>>>>>>>>> drm_gem_object,
>>>>>>>>> Christian suggested to instead keep a linked list of dummy pages
>>>>>>>>> to be
>>>>>>>>> allocated on demand once we hit a vm_fault. I will then also
>>>>>>>>> prefault the entire
>>>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
>>>>>>>>> them
>>>>>>>>> to that single dummy page.
>>>>>>>> This strongly feels like premature optimization. If you're worried
>>>>>>>> about
>>>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
>>>>>>>> redundant
>>>>>>>> pointers between gem and ttm bo structs (I think we still have
>>>>>>>> some) :-)
>>>>>>>>
>>>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>>>>> pointer just because" games with hashtables.
>>>>>>>> -Daniel
>>>>>>>
>>>>>>> Well, if you and Christian can agree on this approach and suggest
>>>>>>> maybe what pointer is
>>>>>>> redundant and can be removed from GEM struct so we can use the
>>>>>>> 'credit' to add the dummy page
>>>>>>> to GEM I will be happy to follow through.
>>>>>>>
>>>>>>> P.S Hash table is off the table anyway and we are talking only
>>>>>>> about linked list here since by prefaulting
>>>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
>>>>>>> page faults to same VMA VA range and so
>>>>>>> don't need to search and reuse an existing dummy page but simply
>>>>>>> create a new one for each next fault.
>>>>>>>
>>>>>>> Andrey
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C4b581c55df204ca3d07408d8b64c1db8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459785321798393%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=EvvAip8vs9fzVRS1rb0r5ODiBMngxPuI9GKR2%2F%2B2LzE%3D&amp;reserved=0
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-11 17:41                                   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-11 17:41 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Deucher, Alexander, l.stach, Wentland, Harry, Koenig, Christian,
	yuq825


On 1/11/21 11:15 AM, Daniel Vetter wrote:
> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>> Ok then, I guess I will proceed with the dummy pages list implementation then.
>>>
>>> Andrey
>>>
>>> ________________________________
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: 08 January 2021 09:52
>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>
>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>> extra pointer just for that corner case would suck quite a bit.
>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>> things) which are fairly single-use, so I'm really not much seeing the
>> point in making this a special case. It also means the lifetime management
>> becomes a bit iffy, since we can't throw away the dummy page then the last
>> reference to the bo is released (since we don't track it there), but only
>> when the last pointer to the device is released. Potentially this means a
>> pile of dangling pages hanging around for too long.
> Also if you really, really, really want to have this list, please don't
> reinvent it since we have it already. drmm_ is exactly meant for resources
> that should be freed when the final drm_device reference disappears.
> -Daniel


Can you elaborate ? We still need to actually implement the list but you want me 
to use
drmm_add_action for it's destruction instead of explicitly doing it (like I'm 
already doing from  ttm_bo_device_release) ?

Andrey


>   
>> If you need some ideas for redundant pointers:
>> - destroy callback (kinda not cool to not have this const anyway), we
>>    could refcount it all with the overall gem bo. Quite a bit of work.
>> - bdev pointer, if we move the device ttm stuff into struct drm_device, or
>>    create a common struct ttm_device, we can ditch that
>> - We could probably merge a few of the fields and find 8 bytes somewhere
>> - we still have 2 krefs, would probably need to fix that before we can
>>    merge the destroy callbacks
>>
>> So there's plenty of room still, if the size of a bo struct is really that
>> critical. Imo it's not.
>>
>>
>>> Christian.
>>>
>>> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
>>>> Daniel had some objections to this (see bellow) and so I guess I need
>>>> you both to agree on the approach before I proceed.
>>>>
>>>> Andrey
>>>>
>>>> On 1/8/21 9:33 AM, Christian König wrote:
>>>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>>>>> Hey Christian, just a ping.
>>>>> Was there any question for me here?
>>>>>
>>>>> As far as I can see the best approach would still be to fill the VMA
>>>>> with a single dummy page and avoid pointers in the GEM object.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>>>>> device is removed.
>>>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
>>>>>>>>>>>>>> something we can do.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>>>>
>>>>>>>>>>>> What we can do is to allocate a page on demand for each fault
>>>>>>>>>>>> and link
>>>>>>>>>>>> the together in the bdev instead.
>>>>>>>>>>>>
>>>>>>>>>>>> And when the bdev is then finally destroyed after the last
>>>>>>>>>>>> application
>>>>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>> Hey, started to implement this and then realized that by
>>>>>>>>>>> allocating a page
>>>>>>>>>>> for each fault indiscriminately
>>>>>>>>>>> we will be allocating a new page for each faulting virtual
>>>>>>>>>>> address within a
>>>>>>>>>>> VA range belonging the same BO
>>>>>>>>>>> and this is obviously too much and not the intention. Should I
>>>>>>>>>>> instead use
>>>>>>>>>>> let's say a hashtable with the hash
>>>>>>>>>>> key being faulting BO address to actually keep allocating and
>>>>>>>>>>> reusing same
>>>>>>>>>>> dummy zero page per GEM BO
>>>>>>>>>>> (or for that matter DRM file object address for non imported
>>>>>>>>>>> BOs) ?
>>>>>>>>>> Why do we need a hashtable? All the sw structures to track this
>>>>>>>>>> should
>>>>>>>>>> still be around:
>>>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
>>>>>>>>>> a dma-buf,
>>>>>>>>>>      so defensively allocate a per-bo page
>>>>>>>>>> - otherwise allocate a per-file page
>>>>>>>>> That exactly what we have in current implementation
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
>>>>>>>>>> bit like
>>>>>>>>>> over-optimizing stuff. Better to have a simple implementation
>>>>>>>>>> first and
>>>>>>>>>> then tune it if (and only if) any part of it becomes a problem
>>>>>>>>>> for normal
>>>>>>>>>> usage.
>>>>>>>>> Exactly - the idea is to avoid adding extra pointer to
>>>>>>>>> drm_gem_object,
>>>>>>>>> Christian suggested to instead keep a linked list of dummy pages
>>>>>>>>> to be
>>>>>>>>> allocated on demand once we hit a vm_fault. I will then also
>>>>>>>>> prefault the entire
>>>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
>>>>>>>>> them
>>>>>>>>> to that single dummy page.
>>>>>>>> This strongly feels like premature optimization. If you're worried
>>>>>>>> about
>>>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
>>>>>>>> redundant
>>>>>>>> pointers between gem and ttm bo structs (I think we still have
>>>>>>>> some) :-)
>>>>>>>>
>>>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>>>>> pointer just because" games with hashtables.
>>>>>>>> -Daniel
>>>>>>>
>>>>>>> Well, if you and Christian can agree on this approach and suggest
>>>>>>> maybe what pointer is
>>>>>>> redundant and can be removed from GEM struct so we can use the
>>>>>>> 'credit' to add the dummy page
>>>>>>> to GEM I will be happy to follow through.
>>>>>>>
>>>>>>> P.S Hash table is off the table anyway and we are talking only
>>>>>>> about linked list here since by prefaulting
>>>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
>>>>>>> page faults to same VMA VA range and so
>>>>>>> don't need to search and reuse an existing dummy page but simply
>>>>>>> create a new one for each next fault.
>>>>>>>
>>>>>>> Andrey
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C4b581c55df204ca3d07408d8b64c1db8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459785321798393%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=EvvAip8vs9fzVRS1rb0r5ODiBMngxPuI9GKR2%2F%2B2LzE%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-11 17:41                                   ` Andrey Grodzovsky
  (?)
@ 2021-01-11 18:31                                   ` Andrey Grodzovsky
  2021-01-12  9:07                                     ` Daniel Vetter
  -1 siblings, 1 reply; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-11 18:31 UTC (permalink / raw)
  To: dri-devel


On 1/11/21 12:41 PM, Andrey Grodzovsky wrote:
>
> On 1/11/21 11:15 AM, Daniel Vetter wrote:
>> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>>> Ok then, I guess I will proceed with the dummy pages list implementation then.
>>>>
>>>> Andrey
>>>>
>>>> ________________________________
>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>> Sent: 08 January 2021 09:52
>>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter 
>>>> <daniel@ffwll.ch>
>>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; 
>>>> dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; 
>>>> daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org 
>>>> <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; 
>>>> yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; 
>>>> Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org 
>>>> <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; 
>>>> Wentland, Harry <Harry.Wentland@amd.com>
>>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>>
>>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>>> extra pointer just for that corner case would suck quite a bit.
>>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>>> things) which are fairly single-use, so I'm really not much seeing the
>>> point in making this a special case. It also means the lifetime management
>>> becomes a bit iffy, since we can't throw away the dummy page then the last
>>> reference to the bo is released (since we don't track it there), but only
>>> when the last pointer to the device is released. Potentially this means a
>>> pile of dangling pages hanging around for too long.
>> Also if you really, really, really want to have this list, please don't
>> reinvent it since we have it already. drmm_ is exactly meant for resources
>> that should be freed when the final drm_device reference disappears.
>> -Daniel
>
>
> Can you elaborate ? We still need to actually implement the list but you want 
> me to use
> drmm_add_action for it's destruction instead of explicitly doing it (like I'm 
> already doing from  ttm_bo_device_release) ?
>
> Andrey


Oh, i get it i think, you want me to allocate each page using drmm_kzalloc so 
when drm_dev dies it will be freed on it's own.
Great idea and makes my implementation much less cumbersome.

Andrey


>
>
>>> If you need some ideas for redundant pointers:
>>> - destroy callback (kinda not cool to not have this const anyway), we
>>>    could refcount it all with the overall gem bo. Quite a bit of work.
>>> - bdev pointer, if we move the device ttm stuff into struct drm_device, or
>>>    create a common struct ttm_device, we can ditch that
>>> - We could probably merge a few of the fields and find 8 bytes somewhere
>>> - we still have 2 krefs, would probably need to fix that before we can
>>>    merge the destroy callbacks
>>>
>>> So there's plenty of room still, if the size of a bo struct is really that
>>> critical. Imo it's not.
>>>
>>>
>>>> Christian.
>>>>
>>>> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
>>>>> Daniel had some objections to this (see bellow) and so I guess I need
>>>>> you both to agree on the approach before I proceed.
>>>>>
>>>>> Andrey
>>>>>
>>>>> On 1/8/21 9:33 AM, Christian König wrote:
>>>>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>>>>>> Hey Christian, just a ping.
>>>>>> Was there any question for me here?
>>>>>>
>>>>>> As far as I can see the best approach would still be to fill the VMA
>>>>>> with a single dummy page and avoid pointers in the GEM object.
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>>>>>> device is removed.
>>>>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
>>>>>>>>>>>>>>> something we can do.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What we can do is to allocate a page on demand for each fault
>>>>>>>>>>>>> and link
>>>>>>>>>>>>> the together in the bdev instead.
>>>>>>>>>>>>>
>>>>>>>>>>>>> And when the bdev is then finally destroyed after the last
>>>>>>>>>>>>> application
>>>>>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>> Hey, started to implement this and then realized that by
>>>>>>>>>>>> allocating a page
>>>>>>>>>>>> for each fault indiscriminately
>>>>>>>>>>>> we will be allocating a new page for each faulting virtual
>>>>>>>>>>>> address within a
>>>>>>>>>>>> VA range belonging the same BO
>>>>>>>>>>>> and this is obviously too much and not the intention. Should I
>>>>>>>>>>>> instead use
>>>>>>>>>>>> let's say a hashtable with the hash
>>>>>>>>>>>> key being faulting BO address to actually keep allocating and
>>>>>>>>>>>> reusing same
>>>>>>>>>>>> dummy zero page per GEM BO
>>>>>>>>>>>> (or for that matter DRM file object address for non imported
>>>>>>>>>>>> BOs) ?
>>>>>>>>>>> Why do we need a hashtable? All the sw structures to track this
>>>>>>>>>>> should
>>>>>>>>>>> still be around:
>>>>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
>>>>>>>>>>> a dma-buf,
>>>>>>>>>>>      so defensively allocate a per-bo page
>>>>>>>>>>> - otherwise allocate a per-file page
>>>>>>>>>> That exactly what we have in current implementation
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
>>>>>>>>>>> bit like
>>>>>>>>>>> over-optimizing stuff. Better to have a simple implementation
>>>>>>>>>>> first and
>>>>>>>>>>> then tune it if (and only if) any part of it becomes a problem
>>>>>>>>>>> for normal
>>>>>>>>>>> usage.
>>>>>>>>>> Exactly - the idea is to avoid adding extra pointer to
>>>>>>>>>> drm_gem_object,
>>>>>>>>>> Christian suggested to instead keep a linked list of dummy pages
>>>>>>>>>> to be
>>>>>>>>>> allocated on demand once we hit a vm_fault. I will then also
>>>>>>>>>> prefault the entire
>>>>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
>>>>>>>>>> them
>>>>>>>>>> to that single dummy page.
>>>>>>>>> This strongly feels like premature optimization. If you're worried
>>>>>>>>> about
>>>>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
>>>>>>>>> redundant
>>>>>>>>> pointers between gem and ttm bo structs (I think we still have
>>>>>>>>> some) :-)
>>>>>>>>>
>>>>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>>>>>> pointer just because" games with hashtables.
>>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>> Well, if you and Christian can agree on this approach and suggest
>>>>>>>> maybe what pointer is
>>>>>>>> redundant and can be removed from GEM struct so we can use the
>>>>>>>> 'credit' to add the dummy page
>>>>>>>> to GEM I will be happy to follow through.
>>>>>>>>
>>>>>>>> P.S Hash table is off the table anyway and we are talking only
>>>>>>>> about linked list here since by prefaulting
>>>>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
>>>>>>>> page faults to same VMA VA range and so
>>>>>>>> don't need to search and reuse an existing dummy page but simply
>>>>>>>> create a new one for each next fault.
>>>>>>>>
>>>>>>>> Andrey
>>> -- 
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C25b079744d6149f8f2d508d8b65825c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459836996005995%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h5dB%2FP90Gt6t6Oxp%2B9BZzk3YH%2BdYUp3hLQ%2B9bhNMOJM%3D&amp;reserved=0 
>>>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C25b079744d6149f8f2d508d8b65825c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459836996015986%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Y5I5d5g1OIaV5lhmeZpSnM0Y10fTGNW%2Fc2G9O5LPn2g%3D&amp;reserved=0 
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-11 16:15                                 ` Daniel Vetter
@ 2021-01-11 20:45                                   ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-11 20:45 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Deucher, Alexander,
	Koenig, Christian, yuq825


On 1/11/21 11:15 AM, Daniel Vetter wrote:
> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>> Ok then, I guess I will proceed with the dummy pages list implementation then.
>>>
>>> Andrey
>>>
>>> ________________________________
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: 08 January 2021 09:52
>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>
>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>> extra pointer just for that corner case would suck quite a bit.
>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>> things) which are fairly single-use, so I'm really not much seeing the
>> point in making this a special case. It also means the lifetime management
>> becomes a bit iffy, since we can't throw away the dummy page then the last
>> reference to the bo is released (since we don't track it there), but only
>> when the last pointer to the device is released. Potentially this means a
>> pile of dangling pages hanging around for too long.
> Also if you really, really, really want to have this list, please don't
> reinvent it since we have it already. drmm_ is exactly meant for resources
> that should be freed when the final drm_device reference disappears.
> -Daniel


I maybe was eager to early, see i need to explicitly allocate the dummy page 
using page_alloc so
i cannot use drmm_kmalloc for this, so once again like with the list i need to 
wrap it with a container struct
which i can then allocate using drmm_kmalloc and inside there will be page 
pointer. But then
on release it needs to free the page and so i supposedly need to use drmm_add_action
to free the page before the container struct is released but drmm_kmalloc 
doesn't allow to set
release action on struct allocation. So I created a new drmm_kmalloc_with_action 
API function
but then you also need to supply the optional data pointer for the release 
action (the struct page in this case)
and so this all becomes a bit overcomplicated (but doable). Is this extra API 
worth adding ? Maybe it can
be useful in general.

Andrey



>   
>> If you need some ideas for redundant pointers:
>> - destroy callback (kinda not cool to not have this const anyway), we
>>    could refcount it all with the overall gem bo. Quite a bit of work.
>> - bdev pointer, if we move the device ttm stuff into struct drm_device, or
>>    create a common struct ttm_device, we can ditch that
>> - We could probably merge a few of the fields and find 8 bytes somewhere
>> - we still have 2 krefs, would probably need to fix that before we can
>>    merge the destroy callbacks
>>
>> So there's plenty of room still, if the size of a bo struct is really that
>> critical. Imo it's not.
>>
>>
>>> Christian.
>>>
>>> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
>>>> Daniel had some objections to this (see bellow) and so I guess I need
>>>> you both to agree on the approach before I proceed.
>>>>
>>>> Andrey
>>>>
>>>> On 1/8/21 9:33 AM, Christian König wrote:
>>>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>>>>> Hey Christian, just a ping.
>>>>> Was there any question for me here?
>>>>>
>>>>> As far as I can see the best approach would still be to fill the VMA
>>>>> with a single dummy page and avoid pointers in the GEM object.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>>>>> device is removed.
>>>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
>>>>>>>>>>>>>> something we can do.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>>>>
>>>>>>>>>>>> What we can do is to allocate a page on demand for each fault
>>>>>>>>>>>> and link
>>>>>>>>>>>> the together in the bdev instead.
>>>>>>>>>>>>
>>>>>>>>>>>> And when the bdev is then finally destroyed after the last
>>>>>>>>>>>> application
>>>>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>> Hey, started to implement this and then realized that by
>>>>>>>>>>> allocating a page
>>>>>>>>>>> for each fault indiscriminately
>>>>>>>>>>> we will be allocating a new page for each faulting virtual
>>>>>>>>>>> address within a
>>>>>>>>>>> VA range belonging the same BO
>>>>>>>>>>> and this is obviously too much and not the intention. Should I
>>>>>>>>>>> instead use
>>>>>>>>>>> let's say a hashtable with the hash
>>>>>>>>>>> key being faulting BO address to actually keep allocating and
>>>>>>>>>>> reusing same
>>>>>>>>>>> dummy zero page per GEM BO
>>>>>>>>>>> (or for that matter DRM file object address for non imported
>>>>>>>>>>> BOs) ?
>>>>>>>>>> Why do we need a hashtable? All the sw structures to track this
>>>>>>>>>> should
>>>>>>>>>> still be around:
>>>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
>>>>>>>>>> a dma-buf,
>>>>>>>>>>      so defensively allocate a per-bo page
>>>>>>>>>> - otherwise allocate a per-file page
>>>>>>>>> That exactly what we have in current implementation
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
>>>>>>>>>> bit like
>>>>>>>>>> over-optimizing stuff. Better to have a simple implementation
>>>>>>>>>> first and
>>>>>>>>>> then tune it if (and only if) any part of it becomes a problem
>>>>>>>>>> for normal
>>>>>>>>>> usage.
>>>>>>>>> Exactly - the idea is to avoid adding extra pointer to
>>>>>>>>> drm_gem_object,
>>>>>>>>> Christian suggested to instead keep a linked list of dummy pages
>>>>>>>>> to be
>>>>>>>>> allocated on demand once we hit a vm_fault. I will then also
>>>>>>>>> prefault the entire
>>>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
>>>>>>>>> them
>>>>>>>>> to that single dummy page.
>>>>>>>> This strongly feels like premature optimization. If you're worried
>>>>>>>> about
>>>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
>>>>>>>> redundant
>>>>>>>> pointers between gem and ttm bo structs (I think we still have
>>>>>>>> some) :-)
>>>>>>>>
>>>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>>>>> pointer just because" games with hashtables.
>>>>>>>> -Daniel
>>>>>>>
>>>>>>> Well, if you and Christian can agree on this approach and suggest
>>>>>>> maybe what pointer is
>>>>>>> redundant and can be removed from GEM struct so we can use the
>>>>>>> 'credit' to add the dummy page
>>>>>>> to GEM I will be happy to follow through.
>>>>>>>
>>>>>>> P.S Hash table is off the table anyway and we are talking only
>>>>>>> about linked list here since by prefaulting
>>>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
>>>>>>> page faults to same VMA VA range and so
>>>>>>> don't need to search and reuse an existing dummy page but simply
>>>>>>> create a new one for each next fault.
>>>>>>>
>>>>>>> Andrey
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C4b581c55df204ca3d07408d8b64c1db8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459785321798393%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=EvvAip8vs9fzVRS1rb0r5ODiBMngxPuI9GKR2%2F%2B2LzE%3D&amp;reserved=0
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-11 20:45                                   ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-11 20:45 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Deucher, Alexander, l.stach, Wentland, Harry, Koenig, Christian,
	yuq825


On 1/11/21 11:15 AM, Daniel Vetter wrote:
> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>> Ok then, I guess I will proceed with the dummy pages list implementation then.
>>>
>>> Andrey
>>>
>>> ________________________________
>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>> Sent: 08 January 2021 09:52
>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>
>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>> extra pointer just for that corner case would suck quite a bit.
>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>> things) which are fairly single-use, so I'm really not much seeing the
>> point in making this a special case. It also means the lifetime management
>> becomes a bit iffy, since we can't throw away the dummy page then the last
>> reference to the bo is released (since we don't track it there), but only
>> when the last pointer to the device is released. Potentially this means a
>> pile of dangling pages hanging around for too long.
> Also if you really, really, really want to have this list, please don't
> reinvent it since we have it already. drmm_ is exactly meant for resources
> that should be freed when the final drm_device reference disappears.
> -Daniel


I maybe was eager to early, see i need to explicitly allocate the dummy page 
using page_alloc so
i cannot use drmm_kmalloc for this, so once again like with the list i need to 
wrap it with a container struct
which i can then allocate using drmm_kmalloc and inside there will be page 
pointer. But then
on release it needs to free the page and so i supposedly need to use drmm_add_action
to free the page before the container struct is released but drmm_kmalloc 
doesn't allow to set
release action on struct allocation. So I created a new drmm_kmalloc_with_action 
API function
but then you also need to supply the optional data pointer for the release 
action (the struct page in this case)
and so this all becomes a bit overcomplicated (but doable). Is this extra API 
worth adding ? Maybe it can
be useful in general.

Andrey



>   
>> If you need some ideas for redundant pointers:
>> - destroy callback (kinda not cool to not have this const anyway), we
>>    could refcount it all with the overall gem bo. Quite a bit of work.
>> - bdev pointer, if we move the device ttm stuff into struct drm_device, or
>>    create a common struct ttm_device, we can ditch that
>> - We could probably merge a few of the fields and find 8 bytes somewhere
>> - we still have 2 krefs, would probably need to fix that before we can
>>    merge the destroy callbacks
>>
>> So there's plenty of room still, if the size of a bo struct is really that
>> critical. Imo it's not.
>>
>>
>>> Christian.
>>>
>>> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
>>>> Daniel had some objections to this (see bellow) and so I guess I need
>>>> you both to agree on the approach before I proceed.
>>>>
>>>> Andrey
>>>>
>>>> On 1/8/21 9:33 AM, Christian König wrote:
>>>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>>>>> Hey Christian, just a ping.
>>>>> Was there any question for me here?
>>>>>
>>>>> As far as I can see the best approach would still be to fill the VMA
>>>>> with a single dummy page and avoid pointers in the GEM object.
>>>>>
>>>>> Christian.
>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>>>>> device is removed.
>>>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
>>>>>>>>>>>>>> something we can do.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>>>>
>>>>>>>>>>>> What we can do is to allocate a page on demand for each fault
>>>>>>>>>>>> and link
>>>>>>>>>>>> the together in the bdev instead.
>>>>>>>>>>>>
>>>>>>>>>>>> And when the bdev is then finally destroyed after the last
>>>>>>>>>>>> application
>>>>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>>>>
>>>>>>>>>>>> Christian.
>>>>>>>>>>> Hey, started to implement this and then realized that by
>>>>>>>>>>> allocating a page
>>>>>>>>>>> for each fault indiscriminately
>>>>>>>>>>> we will be allocating a new page for each faulting virtual
>>>>>>>>>>> address within a
>>>>>>>>>>> VA range belonging the same BO
>>>>>>>>>>> and this is obviously too much and not the intention. Should I
>>>>>>>>>>> instead use
>>>>>>>>>>> let's say a hashtable with the hash
>>>>>>>>>>> key being faulting BO address to actually keep allocating and
>>>>>>>>>>> reusing same
>>>>>>>>>>> dummy zero page per GEM BO
>>>>>>>>>>> (or for that matter DRM file object address for non imported
>>>>>>>>>>> BOs) ?
>>>>>>>>>> Why do we need a hashtable? All the sw structures to track this
>>>>>>>>>> should
>>>>>>>>>> still be around:
>>>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
>>>>>>>>>> a dma-buf,
>>>>>>>>>>      so defensively allocate a per-bo page
>>>>>>>>>> - otherwise allocate a per-file page
>>>>>>>>> That exactly what we have in current implementation
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
>>>>>>>>>> bit like
>>>>>>>>>> over-optimizing stuff. Better to have a simple implementation
>>>>>>>>>> first and
>>>>>>>>>> then tune it if (and only if) any part of it becomes a problem
>>>>>>>>>> for normal
>>>>>>>>>> usage.
>>>>>>>>> Exactly - the idea is to avoid adding extra pointer to
>>>>>>>>> drm_gem_object,
>>>>>>>>> Christian suggested to instead keep a linked list of dummy pages
>>>>>>>>> to be
>>>>>>>>> allocated on demand once we hit a vm_fault. I will then also
>>>>>>>>> prefault the entire
>>>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
>>>>>>>>> them
>>>>>>>>> to that single dummy page.
>>>>>>>> This strongly feels like premature optimization. If you're worried
>>>>>>>> about
>>>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
>>>>>>>> redundant
>>>>>>>> pointers between gem and ttm bo structs (I think we still have
>>>>>>>> some) :-)
>>>>>>>>
>>>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>>>>> pointer just because" games with hashtables.
>>>>>>>> -Daniel
>>>>>>>
>>>>>>> Well, if you and Christian can agree on this approach and suggest
>>>>>>> maybe what pointer is
>>>>>>> redundant and can be removed from GEM struct so we can use the
>>>>>>> 'credit' to add the dummy page
>>>>>>> to GEM I will be happy to follow through.
>>>>>>>
>>>>>>> P.S Hash table is off the table anyway and we are talking only
>>>>>>> about linked list here since by prefaulting
>>>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
>>>>>>> page faults to same VMA VA range and so
>>>>>>> don't need to search and reuse an existing dummy page but simply
>>>>>>> create a new one for each next fault.
>>>>>>>
>>>>>>> Andrey
>> -- 
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C4b581c55df204ca3d07408d8b64c1db8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459785321798393%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=EvvAip8vs9fzVRS1rb0r5ODiBMngxPuI9GKR2%2F%2B2LzE%3D&amp;reserved=0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-11 16:13                               ` Daniel Vetter
@ 2021-01-12  8:12                                 ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-12  8:12 UTC (permalink / raw)
  To: Daniel Vetter, Grodzovsky, Andrey
  Cc: amd-gfx, daniel.vetter, dri-devel, yuq825, gregkh, Deucher, Alexander

Am 11.01.21 um 17:13 schrieb Daniel Vetter:
> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>> Ok then, I guess I will proceed with the dummy pages list implementation then.
>>
>> Andrey
>>
>> ________________________________
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: 08 January 2021 09:52
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>
>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>> worked quite hard on reducing the size of the amdgpu_bo, so another
>> extra pointer just for that corner case would suck quite a bit.
> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> things) which are fairly single-use, so I'm really not much seeing the
> point in making this a special case. It also means the lifetime management
> becomes a bit iffy, since we can't throw away the dummy page then the last
> reference to the bo is released (since we don't track it there), but only
> when the last pointer to the device is released. Potentially this means a
> pile of dangling pages hanging around for too long.

Yeah, all of them are already on my TODO list, but see below.

> If you need some ideas for redundant pointers:
> - destroy callback (kinda not cool to not have this const anyway), we
>    could refcount it all with the overall gem bo. Quite a bit of work.

The bigger problems is that TTM based drivers are using the destroy 
callback pointer to distinct ghost objects from real ones.

We first need to get rid of those. I already have a plan for that and 
~20% of it implemented, but it is more complicated because of the driver 
specific backends in Nouveau, Amdgpu and vmwgfx.

> - bdev pointer, if we move the device ttm stuff into struct drm_device, or
>    create a common struct ttm_device, we can ditch that

Yes, exactly that's what my device structure rename patch set is aiming 
for :)

> - We could probably merge a few of the fields and find 8 bytes somewhere

Please point out where.

> - we still have 2 krefs, would probably need to fix that before we can
>    merge the destroy callbacks

Yes, already on my TODO list as well. But the last time I looked into 
this I was blocked by the struct_mutex once more.

> So there's plenty of room still, if the size of a bo struct is really that
> critical. Imo it's not.

It is. See we had a size of struct amdgpu_bo of over 1500 bytes because 
we stopped caring for that, no we are down to 816 at the moment.

We really need to get rid of this duplication of functionality and 
structure between TTM and GEM.

Christian.

> -Daniel
>
>
>> Christian.
>>
>> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
>>> Daniel had some objections to this (see bellow) and so I guess I need
>>> you both to agree on the approach before I proceed.
>>>
>>> Andrey
>>>
>>> On 1/8/21 9:33 AM, Christian König wrote:
>>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>>>> Hey Christian, just a ping.
>>>> Was there any question for me here?
>>>>
>>>> As far as I can see the best approach would still be to fill the VMA
>>>> with a single dummy page and avoid pointers in the GEM object.
>>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>>>> device is removed.
>>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
>>>>>>>>>>>>> something we can do.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>>>
>>>>>>>>>>> What we can do is to allocate a page on demand for each fault
>>>>>>>>>>> and link
>>>>>>>>>>> the together in the bdev instead.
>>>>>>>>>>>
>>>>>>>>>>> And when the bdev is then finally destroyed after the last
>>>>>>>>>>> application
>>>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>> Hey, started to implement this and then realized that by
>>>>>>>>>> allocating a page
>>>>>>>>>> for each fault indiscriminately
>>>>>>>>>> we will be allocating a new page for each faulting virtual
>>>>>>>>>> address within a
>>>>>>>>>> VA range belonging the same BO
>>>>>>>>>> and this is obviously too much and not the intention. Should I
>>>>>>>>>> instead use
>>>>>>>>>> let's say a hashtable with the hash
>>>>>>>>>> key being faulting BO address to actually keep allocating and
>>>>>>>>>> reusing same
>>>>>>>>>> dummy zero page per GEM BO
>>>>>>>>>> (or for that matter DRM file object address for non imported
>>>>>>>>>> BOs) ?
>>>>>>>>> Why do we need a hashtable? All the sw structures to track this
>>>>>>>>> should
>>>>>>>>> still be around:
>>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
>>>>>>>>> a dma-buf,
>>>>>>>>>      so defensively allocate a per-bo page
>>>>>>>>> - otherwise allocate a per-file page
>>>>>>>> That exactly what we have in current implementation
>>>>>>>>
>>>>>>>>
>>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
>>>>>>>>> bit like
>>>>>>>>> over-optimizing stuff. Better to have a simple implementation
>>>>>>>>> first and
>>>>>>>>> then tune it if (and only if) any part of it becomes a problem
>>>>>>>>> for normal
>>>>>>>>> usage.
>>>>>>>> Exactly - the idea is to avoid adding extra pointer to
>>>>>>>> drm_gem_object,
>>>>>>>> Christian suggested to instead keep a linked list of dummy pages
>>>>>>>> to be
>>>>>>>> allocated on demand once we hit a vm_fault. I will then also
>>>>>>>> prefault the entire
>>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
>>>>>>>> them
>>>>>>>> to that single dummy page.
>>>>>>> This strongly feels like premature optimization. If you're worried
>>>>>>> about
>>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
>>>>>>> redundant
>>>>>>> pointers between gem and ttm bo structs (I think we still have
>>>>>>> some) :-)
>>>>>>>
>>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>>>> pointer just because" games with hashtables.
>>>>>>> -Daniel
>>>>>>
>>>>>> Well, if you and Christian can agree on this approach and suggest
>>>>>> maybe what pointer is
>>>>>> redundant and can be removed from GEM struct so we can use the
>>>>>> 'credit' to add the dummy page
>>>>>> to GEM I will be happy to follow through.
>>>>>>
>>>>>> P.S Hash table is off the table anyway and we are talking only
>>>>>> about linked list here since by prefaulting
>>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
>>>>>> page faults to same VMA VA range and so
>>>>>> don't need to search and reuse an existing dummy page but simply
>>>>>> create a new one for each next fault.
>>>>>>
>>>>>> Andrey

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-12  8:12                                 ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-12  8:12 UTC (permalink / raw)
  To: Daniel Vetter, Grodzovsky, Andrey
  Cc: robh, amd-gfx, daniel.vetter, dri-devel, eric, ppaalanen, yuq825,
	gregkh, Deucher, Alexander, Wentland, Harry, l.stach

Am 11.01.21 um 17:13 schrieb Daniel Vetter:
> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>> Ok then, I guess I will proceed with the dummy pages list implementation then.
>>
>> Andrey
>>
>> ________________________________
>> From: Koenig, Christian <Christian.Koenig@amd.com>
>> Sent: 08 January 2021 09:52
>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>
>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>> worked quite hard on reducing the size of the amdgpu_bo, so another
>> extra pointer just for that corner case would suck quite a bit.
> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> things) which are fairly single-use, so I'm really not much seeing the
> point in making this a special case. It also means the lifetime management
> becomes a bit iffy, since we can't throw away the dummy page then the last
> reference to the bo is released (since we don't track it there), but only
> when the last pointer to the device is released. Potentially this means a
> pile of dangling pages hanging around for too long.

Yeah, all of them are already on my TODO list, but see below.

> If you need some ideas for redundant pointers:
> - destroy callback (kinda not cool to not have this const anyway), we
>    could refcount it all with the overall gem bo. Quite a bit of work.

The bigger problems is that TTM based drivers are using the destroy 
callback pointer to distinct ghost objects from real ones.

We first need to get rid of those. I already have a plan for that and 
~20% of it implemented, but it is more complicated because of the driver 
specific backends in Nouveau, Amdgpu and vmwgfx.

> - bdev pointer, if we move the device ttm stuff into struct drm_device, or
>    create a common struct ttm_device, we can ditch that

Yes, exactly that's what my device structure rename patch set is aiming 
for :)

> - We could probably merge a few of the fields and find 8 bytes somewhere

Please point out where.

> - we still have 2 krefs, would probably need to fix that before we can
>    merge the destroy callbacks

Yes, already on my TODO list as well. But the last time I looked into 
this I was blocked by the struct_mutex once more.

> So there's plenty of room still, if the size of a bo struct is really that
> critical. Imo it's not.

It is. See we had a size of struct amdgpu_bo of over 1500 bytes because 
we stopped caring for that, no we are down to 816 at the moment.

We really need to get rid of this duplication of functionality and 
structure between TTM and GEM.

Christian.

> -Daniel
>
>
>> Christian.
>>
>> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
>>> Daniel had some objections to this (see bellow) and so I guess I need
>>> you both to agree on the approach before I proceed.
>>>
>>> Andrey
>>>
>>> On 1/8/21 9:33 AM, Christian König wrote:
>>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
>>>>> Hey Christian, just a ping.
>>>> Was there any question for me here?
>>>>
>>>> As far as I can see the best approach would still be to fill the VMA
>>>> with a single dummy page and avoid pointers in the GEM object.
>>>>
>>>> Christian.
>>>>
>>>>> Andrey
>>>>>
>>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
>>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
>>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
>>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
>>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
>>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
>>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
>>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
>>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
>>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
>>>>>>>>>>>>>> device is removed.
>>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
>>>>>>>>>>>>> something we can do.
>>>>>>>>>>>>>
>>>>>>>>>>>>> We need to find a different approach here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
>>>>>>>>>>>>> so they are freed when the device is finally reaped?
>>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
>>>>>>>>>>>> this corner case, but why the linking ?
>>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
>>>>>>>>>>> I want to avoid keeping the page in the GEM object.
>>>>>>>>>>>
>>>>>>>>>>> What we can do is to allocate a page on demand for each fault
>>>>>>>>>>> and link
>>>>>>>>>>> the together in the bdev instead.
>>>>>>>>>>>
>>>>>>>>>>> And when the bdev is then finally destroyed after the last
>>>>>>>>>>> application
>>>>>>>>>>> closed we can finally release all of them.
>>>>>>>>>>>
>>>>>>>>>>> Christian.
>>>>>>>>>> Hey, started to implement this and then realized that by
>>>>>>>>>> allocating a page
>>>>>>>>>> for each fault indiscriminately
>>>>>>>>>> we will be allocating a new page for each faulting virtual
>>>>>>>>>> address within a
>>>>>>>>>> VA range belonging the same BO
>>>>>>>>>> and this is obviously too much and not the intention. Should I
>>>>>>>>>> instead use
>>>>>>>>>> let's say a hashtable with the hash
>>>>>>>>>> key being faulting BO address to actually keep allocating and
>>>>>>>>>> reusing same
>>>>>>>>>> dummy zero page per GEM BO
>>>>>>>>>> (or for that matter DRM file object address for non imported
>>>>>>>>>> BOs) ?
>>>>>>>>> Why do we need a hashtable? All the sw structures to track this
>>>>>>>>> should
>>>>>>>>> still be around:
>>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
>>>>>>>>> a dma-buf,
>>>>>>>>>      so defensively allocate a per-bo page
>>>>>>>>> - otherwise allocate a per-file page
>>>>>>>> That exactly what we have in current implementation
>>>>>>>>
>>>>>>>>
>>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
>>>>>>>>> bit like
>>>>>>>>> over-optimizing stuff. Better to have a simple implementation
>>>>>>>>> first and
>>>>>>>>> then tune it if (and only if) any part of it becomes a problem
>>>>>>>>> for normal
>>>>>>>>> usage.
>>>>>>>> Exactly - the idea is to avoid adding extra pointer to
>>>>>>>> drm_gem_object,
>>>>>>>> Christian suggested to instead keep a linked list of dummy pages
>>>>>>>> to be
>>>>>>>> allocated on demand once we hit a vm_fault. I will then also
>>>>>>>> prefault the entire
>>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
>>>>>>>> them
>>>>>>>> to that single dummy page.
>>>>>>> This strongly feels like premature optimization. If you're worried
>>>>>>> about
>>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
>>>>>>> redundant
>>>>>>> pointers between gem and ttm bo structs (I think we still have
>>>>>>> some) :-)
>>>>>>>
>>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
>>>>>>> pointer just because" games with hashtables.
>>>>>>> -Daniel
>>>>>>
>>>>>> Well, if you and Christian can agree on this approach and suggest
>>>>>> maybe what pointer is
>>>>>> redundant and can be removed from GEM struct so we can use the
>>>>>> 'credit' to add the dummy page
>>>>>> to GEM I will be happy to follow through.
>>>>>>
>>>>>> P.S Hash table is off the table anyway and we are talking only
>>>>>> about linked list here since by prefaulting
>>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
>>>>>> page faults to same VMA VA range and so
>>>>>> don't need to search and reuse an existing dummy page but simply
>>>>>> create a new one for each next fault.
>>>>>>
>>>>>> Andrey

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-11 18:31                                   ` Andrey Grodzovsky
@ 2021-01-12  9:07                                     ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-12  9:07 UTC (permalink / raw)
  To: Andrey Grodzovsky; +Cc: dri-devel

On Mon, Jan 11, 2021 at 01:31:00PM -0500, Andrey Grodzovsky wrote:
> 
> On 1/11/21 12:41 PM, Andrey Grodzovsky wrote:
> > 
> > On 1/11/21 11:15 AM, Daniel Vetter wrote:
> > > On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
> > > > On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> > > > > Ok then, I guess I will proceed with the dummy pages list implementation then.
> > > > > 
> > > > > Andrey
> > > > > 
> > > > > ________________________________
> > > > > From: Koenig, Christian <Christian.Koenig@amd.com>
> > > > > Sent: 08 January 2021 09:52
> > > > > To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel
> > > > > Vetter <daniel@ffwll.ch>
> > > > > Cc: amd-gfx@lists.freedesktop.org
> > > > > <amd-gfx@lists.freedesktop.org>;
> > > > > dri-devel@lists.freedesktop.org
> > > > > <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch
> > > > > <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>;
> > > > > l.stach@pengutronix.de <l.stach@pengutronix.de>;
> > > > > yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net
> > > > > <eric@anholt.net>; Deucher, Alexander
> > > > > <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org
> > > > > <gregkh@linuxfoundation.org>; ppaalanen@gmail.com
> > > > > <ppaalanen@gmail.com>; Wentland, Harry
> > > > > <Harry.Wentland@amd.com>
> > > > > Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> > > > > 
> > > > > Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> > > > > worked quite hard on reducing the size of the amdgpu_bo, so another
> > > > > extra pointer just for that corner case would suck quite a bit.
> > > > We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> > > > things) which are fairly single-use, so I'm really not much seeing the
> > > > point in making this a special case. It also means the lifetime management
> > > > becomes a bit iffy, since we can't throw away the dummy page then the last
> > > > reference to the bo is released (since we don't track it there), but only
> > > > when the last pointer to the device is released. Potentially this means a
> > > > pile of dangling pages hanging around for too long.
> > > Also if you really, really, really want to have this list, please don't
> > > reinvent it since we have it already. drmm_ is exactly meant for resources
> > > that should be freed when the final drm_device reference disappears.
> > > -Daniel
> > 
> > 
> > Can you elaborate ? We still need to actually implement the list but you
> > want me to use
> > drmm_add_action for it's destruction instead of explicitly doing it
> > (like I'm already doing from  ttm_bo_device_release) ?
> > 
> > Andrey
> 
> 
> Oh, i get it i think, you want me to allocate each page using drmm_kzalloc
> so when drm_dev dies it will be freed on it's own.
> Great idea and makes my implementation much less cumbersome.

That was my idea, but now after a night's worth of sleep I'm not so sure
it's a bright one: We don't just want 4k of memory, we want a page. And
I'm not sure kzalloc will give us that (plus using a slab page for mmap
might result in fireworks shows).

So maybe just drmm_add_action_or_reset (since I'm also not sure we can
just use the lists in struct page itself for the page we got when we use
alloc_page).
-Daniel

> 
> Andrey
> 
> 
> > 
> > 
> > > > If you need some ideas for redundant pointers:
> > > > - destroy callback (kinda not cool to not have this const anyway), we
> > > >    could refcount it all with the overall gem bo. Quite a bit of work.
> > > > - bdev pointer, if we move the device ttm stuff into struct drm_device, or
> > > >    create a common struct ttm_device, we can ditch that
> > > > - We could probably merge a few of the fields and find 8 bytes somewhere
> > > > - we still have 2 krefs, would probably need to fix that before we can
> > > >    merge the destroy callbacks
> > > > 
> > > > So there's plenty of room still, if the size of a bo struct is really that
> > > > critical. Imo it's not.
> > > > 
> > > > 
> > > > > Christian.
> > > > > 
> > > > > Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> > > > > > Daniel had some objections to this (see bellow) and so I guess I need
> > > > > > you both to agree on the approach before I proceed.
> > > > > > 
> > > > > > Andrey
> > > > > > 
> > > > > > On 1/8/21 9:33 AM, Christian König wrote:
> > > > > > > Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> > > > > > > > Hey Christian, just a ping.
> > > > > > > Was there any question for me here?
> > > > > > > 
> > > > > > > As far as I can see the best approach would still be to fill the VMA
> > > > > > > with a single dummy page and avoid pointers in the GEM object.
> > > > > > > 
> > > > > > > Christian.
> > > > > > > 
> > > > > > > > Andrey
> > > > > > > > 
> > > > > > > > On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> > > > > > > > > On 1/7/21 11:30 AM, Daniel Vetter wrote:
> > > > > > > > > > On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> > > > > > > > > > > On 1/7/21 11:21 AM, Daniel Vetter wrote:
> > > > > > > > > > > > On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> > > > > > > > > > > > > On 11/23/20 3:01 AM, Christian König wrote:
> > > > > > > > > > > > > > Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > > On 11/21/20 9:15 AM, Christian König wrote:
> > > > > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > > > > Will be used to reroute CPU mapped BO's page faults once
> > > > > > > > > > > > > > > > > device is removed.
> > > > > > > > > > > > > > > > Uff, one page for each exported DMA-buf? That's not
> > > > > > > > > > > > > > > > something we can do.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > We need to find a different approach here.
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > Can't we call alloc_page() on each fault and link them together
> > > > > > > > > > > > > > > > so they are freed when the device is finally reaped?
> > > > > > > > > > > > > > > For sure better to optimize and allocate on demand when we reach
> > > > > > > > > > > > > > > this corner case, but why the linking ?
> > > > > > > > > > > > > > > Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > > > > > > > > > > > > > I want to avoid keeping the page in the GEM object.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > What we can do is to allocate a page on demand for each fault
> > > > > > > > > > > > > > and link
> > > > > > > > > > > > > > the together in the bdev instead.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > And when the bdev is then finally destroyed after the last
> > > > > > > > > > > > > > application
> > > > > > > > > > > > > > closed we can finally release all of them.
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > Christian.
> > > > > > > > > > > > > Hey, started to implement this and then realized that by
> > > > > > > > > > > > > allocating a page
> > > > > > > > > > > > > for each fault indiscriminately
> > > > > > > > > > > > > we will be allocating a new page for each faulting virtual
> > > > > > > > > > > > > address within a
> > > > > > > > > > > > > VA range belonging the same BO
> > > > > > > > > > > > > and this is obviously too much and not the intention. Should I
> > > > > > > > > > > > > instead use
> > > > > > > > > > > > > let's say a hashtable with the hash
> > > > > > > > > > > > > key being faulting BO address to actually keep allocating and
> > > > > > > > > > > > > reusing same
> > > > > > > > > > > > > dummy zero page per GEM BO
> > > > > > > > > > > > > (or for that matter DRM file object address for non imported
> > > > > > > > > > > > > BOs) ?
> > > > > > > > > > > > Why do we need a hashtable? All the sw structures to track this
> > > > > > > > > > > > should
> > > > > > > > > > > > still be around:
> > > > > > > > > > > > - if gem_bo->dma_buf is set the buffer is currently exported as
> > > > > > > > > > > > a dma-buf,
> > > > > > > > > > > >      so defensively allocate a per-bo page
> > > > > > > > > > > > - otherwise allocate a per-file page
> > > > > > > > > > > That exactly what we have in current implementation
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > > Or is the idea to save the struct page * pointer? That feels a
> > > > > > > > > > > > bit like
> > > > > > > > > > > > over-optimizing stuff. Better to have a simple implementation
> > > > > > > > > > > > first and
> > > > > > > > > > > > then tune it if (and only if) any part of it becomes a problem
> > > > > > > > > > > > for normal
> > > > > > > > > > > > usage.
> > > > > > > > > > > Exactly - the idea is to avoid adding extra pointer to
> > > > > > > > > > > drm_gem_object,
> > > > > > > > > > > Christian suggested to instead keep a linked list of dummy pages
> > > > > > > > > > > to be
> > > > > > > > > > > allocated on demand once we hit a vm_fault. I will then also
> > > > > > > > > > > prefault the entire
> > > > > > > > > > > VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> > > > > > > > > > > them
> > > > > > > > > > > to that single dummy page.
> > > > > > > > > > This strongly feels like premature optimization. If you're worried
> > > > > > > > > > about
> > > > > > > > > > the overhead on amdgpu, pay down the debt by removing one of the
> > > > > > > > > > redundant
> > > > > > > > > > pointers between gem and ttm bo structs (I think we still have
> > > > > > > > > > some) :-)
> > > > > > > > > > 
> > > > > > > > > > Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> > > > > > > > > > pointer just because" games with hashtables.
> > > > > > > > > > -Daniel
> > > > > > > > > 
> > > > > > > > > Well, if you and Christian can agree on this approach and suggest
> > > > > > > > > maybe what pointer is
> > > > > > > > > redundant and can be removed from GEM struct so we can use the
> > > > > > > > > 'credit' to add the dummy page
> > > > > > > > > to GEM I will be happy to follow through.
> > > > > > > > > 
> > > > > > > > > P.S Hash table is off the table anyway and we are talking only
> > > > > > > > > about linked list here since by prefaulting
> > > > > > > > > the entire VA range for a vmf->vma i will be avoiding redundant
> > > > > > > > > page faults to same VMA VA range and so
> > > > > > > > > don't need to search and reuse an existing dummy page but simply
> > > > > > > > > create a new one for each next fault.
> > > > > > > > > 
> > > > > > > > > Andrey
> > > > -- 
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C25b079744d6149f8f2d508d8b65825c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459836996005995%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=h5dB%2FP90Gt6t6Oxp%2B9BZzk3YH%2BdYUp3hLQ%2B9bhNMOJM%3D&amp;reserved=0
> > > > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C25b079744d6149f8f2d508d8b65825c6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459836996015986%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Y5I5d5g1OIaV5lhmeZpSnM0Y10fTGNW%2Fc2G9O5LPn2g%3D&amp;reserved=0
> > 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-11 20:45                                   ` Andrey Grodzovsky
@ 2021-01-12  9:10                                     ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-12  9:10 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: daniel.vetter, dri-devel, amd-gfx, gregkh, Deucher, Alexander,
	Koenig, Christian, yuq825

On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
> 
> On 1/11/21 11:15 AM, Daniel Vetter wrote:
> > On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
> > > On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> > > > Ok then, I guess I will proceed with the dummy pages list implementation then.
> > > > 
> > > > Andrey
> > > > 
> > > > ________________________________
> > > > From: Koenig, Christian <Christian.Koenig@amd.com>
> > > > Sent: 08 January 2021 09:52
> > > > To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> > > > Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
> > > > Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> > > > 
> > > > Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> > > > worked quite hard on reducing the size of the amdgpu_bo, so another
> > > > extra pointer just for that corner case would suck quite a bit.
> > > We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> > > things) which are fairly single-use, so I'm really not much seeing the
> > > point in making this a special case. It also means the lifetime management
> > > becomes a bit iffy, since we can't throw away the dummy page then the last
> > > reference to the bo is released (since we don't track it there), but only
> > > when the last pointer to the device is released. Potentially this means a
> > > pile of dangling pages hanging around for too long.
> > Also if you really, really, really want to have this list, please don't
> > reinvent it since we have it already. drmm_ is exactly meant for resources
> > that should be freed when the final drm_device reference disappears.
> > -Daniel
> 
> 
> I maybe was eager to early, see i need to explicitly allocate the dummy page
> using page_alloc so
> i cannot use drmm_kmalloc for this, so once again like with the list i need
> to wrap it with a container struct
> which i can then allocate using drmm_kmalloc and inside there will be page
> pointer. But then
> on release it needs to free the page and so i supposedly need to use drmm_add_action
> to free the page before the container struct is released but drmm_kmalloc
> doesn't allow to set
> release action on struct allocation. So I created a new
> drmm_kmalloc_with_action API function
> but then you also need to supply the optional data pointer for the release
> action (the struct page in this case)
> and so this all becomes a bit overcomplicated (but doable). Is this extra
> API worth adding ? Maybe it can
> be useful in general.

drm_add_action_or_reset (for better control flow) has both a void * data
and a cleanup function (and it internally allocates the tracking structure
for that for you). So should work as-is? Allocating a tracking structure
for our tracking structure for a page would definitely be a bit too much.

Essentiall drmm_add_action is your kcalloc_with_action function you want,
as long as all you need is a single void * pointer (we could do the
kzalloc_with_action though, there's enough space, just no need yet for any
of the current users).
-Daniel

> 
> Andrey
> 
> 
> 
> > > If you need some ideas for redundant pointers:
> > > - destroy callback (kinda not cool to not have this const anyway), we
> > >    could refcount it all with the overall gem bo. Quite a bit of work.
> > > - bdev pointer, if we move the device ttm stuff into struct drm_device, or
> > >    create a common struct ttm_device, we can ditch that
> > > - We could probably merge a few of the fields and find 8 bytes somewhere
> > > - we still have 2 krefs, would probably need to fix that before we can
> > >    merge the destroy callbacks
> > > 
> > > So there's plenty of room still, if the size of a bo struct is really that
> > > critical. Imo it's not.
> > > 
> > > 
> > > > Christian.
> > > > 
> > > > Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> > > > > Daniel had some objections to this (see bellow) and so I guess I need
> > > > > you both to agree on the approach before I proceed.
> > > > > 
> > > > > Andrey
> > > > > 
> > > > > On 1/8/21 9:33 AM, Christian König wrote:
> > > > > > Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> > > > > > > Hey Christian, just a ping.
> > > > > > Was there any question for me here?
> > > > > > 
> > > > > > As far as I can see the best approach would still be to fill the VMA
> > > > > > with a single dummy page and avoid pointers in the GEM object.
> > > > > > 
> > > > > > Christian.
> > > > > > 
> > > > > > > Andrey
> > > > > > > 
> > > > > > > On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> > > > > > > > On 1/7/21 11:30 AM, Daniel Vetter wrote:
> > > > > > > > > On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> > > > > > > > > > On 1/7/21 11:21 AM, Daniel Vetter wrote:
> > > > > > > > > > > On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> > > > > > > > > > > > On 11/23/20 3:01 AM, Christian König wrote:
> > > > > > > > > > > > > Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > On 11/21/20 9:15 AM, Christian König wrote:
> > > > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > > > Will be used to reroute CPU mapped BO's page faults once
> > > > > > > > > > > > > > > > device is removed.
> > > > > > > > > > > > > > > Uff, one page for each exported DMA-buf? That's not
> > > > > > > > > > > > > > > something we can do.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > We need to find a different approach here.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Can't we call alloc_page() on each fault and link them together
> > > > > > > > > > > > > > > so they are freed when the device is finally reaped?
> > > > > > > > > > > > > > For sure better to optimize and allocate on demand when we reach
> > > > > > > > > > > > > > this corner case, but why the linking ?
> > > > > > > > > > > > > > Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > > > > > > > > > > > > I want to avoid keeping the page in the GEM object.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > What we can do is to allocate a page on demand for each fault
> > > > > > > > > > > > > and link
> > > > > > > > > > > > > the together in the bdev instead.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > And when the bdev is then finally destroyed after the last
> > > > > > > > > > > > > application
> > > > > > > > > > > > > closed we can finally release all of them.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Christian.
> > > > > > > > > > > > Hey, started to implement this and then realized that by
> > > > > > > > > > > > allocating a page
> > > > > > > > > > > > for each fault indiscriminately
> > > > > > > > > > > > we will be allocating a new page for each faulting virtual
> > > > > > > > > > > > address within a
> > > > > > > > > > > > VA range belonging the same BO
> > > > > > > > > > > > and this is obviously too much and not the intention. Should I
> > > > > > > > > > > > instead use
> > > > > > > > > > > > let's say a hashtable with the hash
> > > > > > > > > > > > key being faulting BO address to actually keep allocating and
> > > > > > > > > > > > reusing same
> > > > > > > > > > > > dummy zero page per GEM BO
> > > > > > > > > > > > (or for that matter DRM file object address for non imported
> > > > > > > > > > > > BOs) ?
> > > > > > > > > > > Why do we need a hashtable? All the sw structures to track this
> > > > > > > > > > > should
> > > > > > > > > > > still be around:
> > > > > > > > > > > - if gem_bo->dma_buf is set the buffer is currently exported as
> > > > > > > > > > > a dma-buf,
> > > > > > > > > > >      so defensively allocate a per-bo page
> > > > > > > > > > > - otherwise allocate a per-file page
> > > > > > > > > > That exactly what we have in current implementation
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > Or is the idea to save the struct page * pointer? That feels a
> > > > > > > > > > > bit like
> > > > > > > > > > > over-optimizing stuff. Better to have a simple implementation
> > > > > > > > > > > first and
> > > > > > > > > > > then tune it if (and only if) any part of it becomes a problem
> > > > > > > > > > > for normal
> > > > > > > > > > > usage.
> > > > > > > > > > Exactly - the idea is to avoid adding extra pointer to
> > > > > > > > > > drm_gem_object,
> > > > > > > > > > Christian suggested to instead keep a linked list of dummy pages
> > > > > > > > > > to be
> > > > > > > > > > allocated on demand once we hit a vm_fault. I will then also
> > > > > > > > > > prefault the entire
> > > > > > > > > > VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> > > > > > > > > > them
> > > > > > > > > > to that single dummy page.
> > > > > > > > > This strongly feels like premature optimization. If you're worried
> > > > > > > > > about
> > > > > > > > > the overhead on amdgpu, pay down the debt by removing one of the
> > > > > > > > > redundant
> > > > > > > > > pointers between gem and ttm bo structs (I think we still have
> > > > > > > > > some) :-)
> > > > > > > > > 
> > > > > > > > > Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> > > > > > > > > pointer just because" games with hashtables.
> > > > > > > > > -Daniel
> > > > > > > > 
> > > > > > > > Well, if you and Christian can agree on this approach and suggest
> > > > > > > > maybe what pointer is
> > > > > > > > redundant and can be removed from GEM struct so we can use the
> > > > > > > > 'credit' to add the dummy page
> > > > > > > > to GEM I will be happy to follow through.
> > > > > > > > 
> > > > > > > > P.S Hash table is off the table anyway and we are talking only
> > > > > > > > about linked list here since by prefaulting
> > > > > > > > the entire VA range for a vmf->vma i will be avoiding redundant
> > > > > > > > page faults to same VMA VA range and so
> > > > > > > > don't need to search and reuse an existing dummy page but simply
> > > > > > > > create a new one for each next fault.
> > > > > > > > 
> > > > > > > > Andrey
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C4b581c55df204ca3d07408d8b64c1db8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459785321798393%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=EvvAip8vs9fzVRS1rb0r5ODiBMngxPuI9GKR2%2F%2B2LzE%3D&amp;reserved=0

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-12  9:10                                     ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-12  9:10 UTC (permalink / raw)
  To: Andrey Grodzovsky
  Cc: robh, daniel.vetter, dri-devel, eric, ppaalanen, amd-gfx,
	Daniel Vetter, gregkh, Deucher, Alexander, l.stach, Wentland,
	Harry, Koenig, Christian, yuq825

On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
> 
> On 1/11/21 11:15 AM, Daniel Vetter wrote:
> > On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
> > > On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> > > > Ok then, I guess I will proceed with the dummy pages list implementation then.
> > > > 
> > > > Andrey
> > > > 
> > > > ________________________________
> > > > From: Koenig, Christian <Christian.Koenig@amd.com>
> > > > Sent: 08 January 2021 09:52
> > > > To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> > > > Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
> > > > Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> > > > 
> > > > Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> > > > worked quite hard on reducing the size of the amdgpu_bo, so another
> > > > extra pointer just for that corner case would suck quite a bit.
> > > We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> > > things) which are fairly single-use, so I'm really not much seeing the
> > > point in making this a special case. It also means the lifetime management
> > > becomes a bit iffy, since we can't throw away the dummy page then the last
> > > reference to the bo is released (since we don't track it there), but only
> > > when the last pointer to the device is released. Potentially this means a
> > > pile of dangling pages hanging around for too long.
> > Also if you really, really, really want to have this list, please don't
> > reinvent it since we have it already. drmm_ is exactly meant for resources
> > that should be freed when the final drm_device reference disappears.
> > -Daniel
> 
> 
> I maybe was eager to early, see i need to explicitly allocate the dummy page
> using page_alloc so
> i cannot use drmm_kmalloc for this, so once again like with the list i need
> to wrap it with a container struct
> which i can then allocate using drmm_kmalloc and inside there will be page
> pointer. But then
> on release it needs to free the page and so i supposedly need to use drmm_add_action
> to free the page before the container struct is released but drmm_kmalloc
> doesn't allow to set
> release action on struct allocation. So I created a new
> drmm_kmalloc_with_action API function
> but then you also need to supply the optional data pointer for the release
> action (the struct page in this case)
> and so this all becomes a bit overcomplicated (but doable). Is this extra
> API worth adding ? Maybe it can
> be useful in general.

drm_add_action_or_reset (for better control flow) has both a void * data
and a cleanup function (and it internally allocates the tracking structure
for that for you). So should work as-is? Allocating a tracking structure
for our tracking structure for a page would definitely be a bit too much.

Essentiall drmm_add_action is your kcalloc_with_action function you want,
as long as all you need is a single void * pointer (we could do the
kzalloc_with_action though, there's enough space, just no need yet for any
of the current users).
-Daniel

> 
> Andrey
> 
> 
> 
> > > If you need some ideas for redundant pointers:
> > > - destroy callback (kinda not cool to not have this const anyway), we
> > >    could refcount it all with the overall gem bo. Quite a bit of work.
> > > - bdev pointer, if we move the device ttm stuff into struct drm_device, or
> > >    create a common struct ttm_device, we can ditch that
> > > - We could probably merge a few of the fields and find 8 bytes somewhere
> > > - we still have 2 krefs, would probably need to fix that before we can
> > >    merge the destroy callbacks
> > > 
> > > So there's plenty of room still, if the size of a bo struct is really that
> > > critical. Imo it's not.
> > > 
> > > 
> > > > Christian.
> > > > 
> > > > Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> > > > > Daniel had some objections to this (see bellow) and so I guess I need
> > > > > you both to agree on the approach before I proceed.
> > > > > 
> > > > > Andrey
> > > > > 
> > > > > On 1/8/21 9:33 AM, Christian König wrote:
> > > > > > Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> > > > > > > Hey Christian, just a ping.
> > > > > > Was there any question for me here?
> > > > > > 
> > > > > > As far as I can see the best approach would still be to fill the VMA
> > > > > > with a single dummy page and avoid pointers in the GEM object.
> > > > > > 
> > > > > > Christian.
> > > > > > 
> > > > > > > Andrey
> > > > > > > 
> > > > > > > On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> > > > > > > > On 1/7/21 11:30 AM, Daniel Vetter wrote:
> > > > > > > > > On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> > > > > > > > > > On 1/7/21 11:21 AM, Daniel Vetter wrote:
> > > > > > > > > > > On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> > > > > > > > > > > > On 11/23/20 3:01 AM, Christian König wrote:
> > > > > > > > > > > > > Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > On 11/21/20 9:15 AM, Christian König wrote:
> > > > > > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > > > > > Will be used to reroute CPU mapped BO's page faults once
> > > > > > > > > > > > > > > > device is removed.
> > > > > > > > > > > > > > > Uff, one page for each exported DMA-buf? That's not
> > > > > > > > > > > > > > > something we can do.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > We need to find a different approach here.
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Can't we call alloc_page() on each fault and link them together
> > > > > > > > > > > > > > > so they are freed when the device is finally reaped?
> > > > > > > > > > > > > > For sure better to optimize and allocate on demand when we reach
> > > > > > > > > > > > > > this corner case, but why the linking ?
> > > > > > > > > > > > > > Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > > > > > > > > > > > > I want to avoid keeping the page in the GEM object.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > What we can do is to allocate a page on demand for each fault
> > > > > > > > > > > > > and link
> > > > > > > > > > > > > the together in the bdev instead.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > And when the bdev is then finally destroyed after the last
> > > > > > > > > > > > > application
> > > > > > > > > > > > > closed we can finally release all of them.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Christian.
> > > > > > > > > > > > Hey, started to implement this and then realized that by
> > > > > > > > > > > > allocating a page
> > > > > > > > > > > > for each fault indiscriminately
> > > > > > > > > > > > we will be allocating a new page for each faulting virtual
> > > > > > > > > > > > address within a
> > > > > > > > > > > > VA range belonging the same BO
> > > > > > > > > > > > and this is obviously too much and not the intention. Should I
> > > > > > > > > > > > instead use
> > > > > > > > > > > > let's say a hashtable with the hash
> > > > > > > > > > > > key being faulting BO address to actually keep allocating and
> > > > > > > > > > > > reusing same
> > > > > > > > > > > > dummy zero page per GEM BO
> > > > > > > > > > > > (or for that matter DRM file object address for non imported
> > > > > > > > > > > > BOs) ?
> > > > > > > > > > > Why do we need a hashtable? All the sw structures to track this
> > > > > > > > > > > should
> > > > > > > > > > > still be around:
> > > > > > > > > > > - if gem_bo->dma_buf is set the buffer is currently exported as
> > > > > > > > > > > a dma-buf,
> > > > > > > > > > >      so defensively allocate a per-bo page
> > > > > > > > > > > - otherwise allocate a per-file page
> > > > > > > > > > That exactly what we have in current implementation
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > Or is the idea to save the struct page * pointer? That feels a
> > > > > > > > > > > bit like
> > > > > > > > > > > over-optimizing stuff. Better to have a simple implementation
> > > > > > > > > > > first and
> > > > > > > > > > > then tune it if (and only if) any part of it becomes a problem
> > > > > > > > > > > for normal
> > > > > > > > > > > usage.
> > > > > > > > > > Exactly - the idea is to avoid adding extra pointer to
> > > > > > > > > > drm_gem_object,
> > > > > > > > > > Christian suggested to instead keep a linked list of dummy pages
> > > > > > > > > > to be
> > > > > > > > > > allocated on demand once we hit a vm_fault. I will then also
> > > > > > > > > > prefault the entire
> > > > > > > > > > VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> > > > > > > > > > them
> > > > > > > > > > to that single dummy page.
> > > > > > > > > This strongly feels like premature optimization. If you're worried
> > > > > > > > > about
> > > > > > > > > the overhead on amdgpu, pay down the debt by removing one of the
> > > > > > > > > redundant
> > > > > > > > > pointers between gem and ttm bo structs (I think we still have
> > > > > > > > > some) :-)
> > > > > > > > > 
> > > > > > > > > Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> > > > > > > > > pointer just because" games with hashtables.
> > > > > > > > > -Daniel
> > > > > > > > 
> > > > > > > > Well, if you and Christian can agree on this approach and suggest
> > > > > > > > maybe what pointer is
> > > > > > > > redundant and can be removed from GEM struct so we can use the
> > > > > > > > 'credit' to add the dummy page
> > > > > > > > to GEM I will be happy to follow through.
> > > > > > > > 
> > > > > > > > P.S Hash table is off the table anyway and we are talking only
> > > > > > > > about linked list here since by prefaulting
> > > > > > > > the entire VA range for a vmf->vma i will be avoiding redundant
> > > > > > > > page faults to same VMA VA range and so
> > > > > > > > don't need to search and reuse an existing dummy page but simply
> > > > > > > > create a new one for each next fault.
> > > > > > > > 
> > > > > > > > Andrey
> > > -- 
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Candrey.grodzovsky%40amd.com%7C4b581c55df204ca3d07408d8b64c1db8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637459785321798393%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=EvvAip8vs9fzVRS1rb0r5ODiBMngxPuI9GKR2%2F%2B2LzE%3D&amp;reserved=0

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-12  8:12                                 ` Christian König
@ 2021-01-12  9:13                                   ` Daniel Vetter
  -1 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-12  9:13 UTC (permalink / raw)
  To: Christian König
  Cc: amd-gfx, gregkh, dri-devel, yuq825, Deucher, Alexander

On Tue, Jan 12, 2021 at 9:12 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 11.01.21 um 17:13 schrieb Daniel Vetter:
> > On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> >> Ok then, I guess I will proceed with the dummy pages list implementation then.
> >>
> >> Andrey
> >>
> >> ________________________________
> >> From: Koenig, Christian <Christian.Koenig@amd.com>
> >> Sent: 08 January 2021 09:52
> >> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> >> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
> >> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> >>
> >> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> >> worked quite hard on reducing the size of the amdgpu_bo, so another
> >> extra pointer just for that corner case would suck quite a bit.
> > We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> > things) which are fairly single-use, so I'm really not much seeing the
> > point in making this a special case. It also means the lifetime management
> > becomes a bit iffy, since we can't throw away the dummy page then the last
> > reference to the bo is released (since we don't track it there), but only
> > when the last pointer to the device is released. Potentially this means a
> > pile of dangling pages hanging around for too long.
>
> Yeah, all of them are already on my TODO list, but see below.
>
> > If you need some ideas for redundant pointers:
> > - destroy callback (kinda not cool to not have this const anyway), we
> >    could refcount it all with the overall gem bo. Quite a bit of work.
>
> The bigger problems is that TTM based drivers are using the destroy
> callback pointer to distinct ghost objects from real ones.
>
> We first need to get rid of those. I already have a plan for that and
> ~20% of it implemented, but it is more complicated because of the driver
> specific backends in Nouveau, Amdgpu and vmwgfx.
>
> > - bdev pointer, if we move the device ttm stuff into struct drm_device, or
> >    create a common struct ttm_device, we can ditch that
>
> Yes, exactly that's what my device structure rename patch set is aiming
> for :)

Hm already on the list and did I miss it?

> > - We could probably merge a few of the fields and find 8 bytes somewhere
>
> Please point out where.

Flags and bool deleted looked compressible at a glance. Not sure
that's worth it.

> > - we still have 2 krefs, would probably need to fix that before we can
> >    merge the destroy callbacks
>
> Yes, already on my TODO list as well. But the last time I looked into
> this I was blocked by the struct_mutex once more.

Uh struct_mutex, I thought we've killed that for good. How is it
getting in the way?

> > So there's plenty of room still, if the size of a bo struct is really that
> > critical. Imo it's not.
>
> It is. See we had a size of struct amdgpu_bo of over 1500 bytes because
> we stopped caring for that, no we are down to 816 at the moment.
>
> We really need to get rid of this duplication of functionality and
> structure between TTM and GEM.

Yeah, and if you have patches nag me, happy to review them anytime really.

Cheers, Daniel

>
> Christian.
>
> > -Daniel
> >
> >
> >> Christian.
> >>
> >> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> >>> Daniel had some objections to this (see bellow) and so I guess I need
> >>> you both to agree on the approach before I proceed.
> >>>
> >>> Andrey
> >>>
> >>> On 1/8/21 9:33 AM, Christian König wrote:
> >>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> >>>>> Hey Christian, just a ping.
> >>>> Was there any question for me here?
> >>>>
> >>>> As far as I can see the best approach would still be to fill the VMA
> >>>> with a single dummy page and avoid pointers in the GEM object.
> >>>>
> >>>> Christian.
> >>>>
> >>>>> Andrey
> >>>>>
> >>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> >>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
> >>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> >>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> >>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> >>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
> >>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> >>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
> >>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> >>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
> >>>>>>>>>>>>>> device is removed.
> >>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
> >>>>>>>>>>>>> something we can do.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We need to find a different approach here.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
> >>>>>>>>>>>>> so they are freed when the device is finally reaped?
> >>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
> >>>>>>>>>>>> this corner case, but why the linking ?
> >>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
> >>>>>>>>>>> I want to avoid keeping the page in the GEM object.
> >>>>>>>>>>>
> >>>>>>>>>>> What we can do is to allocate a page on demand for each fault
> >>>>>>>>>>> and link
> >>>>>>>>>>> the together in the bdev instead.
> >>>>>>>>>>>
> >>>>>>>>>>> And when the bdev is then finally destroyed after the last
> >>>>>>>>>>> application
> >>>>>>>>>>> closed we can finally release all of them.
> >>>>>>>>>>>
> >>>>>>>>>>> Christian.
> >>>>>>>>>> Hey, started to implement this and then realized that by
> >>>>>>>>>> allocating a page
> >>>>>>>>>> for each fault indiscriminately
> >>>>>>>>>> we will be allocating a new page for each faulting virtual
> >>>>>>>>>> address within a
> >>>>>>>>>> VA range belonging the same BO
> >>>>>>>>>> and this is obviously too much and not the intention. Should I
> >>>>>>>>>> instead use
> >>>>>>>>>> let's say a hashtable with the hash
> >>>>>>>>>> key being faulting BO address to actually keep allocating and
> >>>>>>>>>> reusing same
> >>>>>>>>>> dummy zero page per GEM BO
> >>>>>>>>>> (or for that matter DRM file object address for non imported
> >>>>>>>>>> BOs) ?
> >>>>>>>>> Why do we need a hashtable? All the sw structures to track this
> >>>>>>>>> should
> >>>>>>>>> still be around:
> >>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
> >>>>>>>>> a dma-buf,
> >>>>>>>>>      so defensively allocate a per-bo page
> >>>>>>>>> - otherwise allocate a per-file page
> >>>>>>>> That exactly what we have in current implementation
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
> >>>>>>>>> bit like
> >>>>>>>>> over-optimizing stuff. Better to have a simple implementation
> >>>>>>>>> first and
> >>>>>>>>> then tune it if (and only if) any part of it becomes a problem
> >>>>>>>>> for normal
> >>>>>>>>> usage.
> >>>>>>>> Exactly - the idea is to avoid adding extra pointer to
> >>>>>>>> drm_gem_object,
> >>>>>>>> Christian suggested to instead keep a linked list of dummy pages
> >>>>>>>> to be
> >>>>>>>> allocated on demand once we hit a vm_fault. I will then also
> >>>>>>>> prefault the entire
> >>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> >>>>>>>> them
> >>>>>>>> to that single dummy page.
> >>>>>>> This strongly feels like premature optimization. If you're worried
> >>>>>>> about
> >>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
> >>>>>>> redundant
> >>>>>>> pointers between gem and ttm bo structs (I think we still have
> >>>>>>> some) :-)
> >>>>>>>
> >>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> >>>>>>> pointer just because" games with hashtables.
> >>>>>>> -Daniel
> >>>>>>
> >>>>>> Well, if you and Christian can agree on this approach and suggest
> >>>>>> maybe what pointer is
> >>>>>> redundant and can be removed from GEM struct so we can use the
> >>>>>> 'credit' to add the dummy page
> >>>>>> to GEM I will be happy to follow through.
> >>>>>>
> >>>>>> P.S Hash table is off the table anyway and we are talking only
> >>>>>> about linked list here since by prefaulting
> >>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
> >>>>>> page faults to same VMA VA range and so
> >>>>>> don't need to search and reuse an existing dummy page but simply
> >>>>>> create a new one for each next fault.
> >>>>>>
> >>>>>> Andrey
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-12  9:13                                   ` Daniel Vetter
  0 siblings, 0 replies; 212+ messages in thread
From: Daniel Vetter @ 2021-01-12  9:13 UTC (permalink / raw)
  To: Christian König
  Cc: Grodzovsky, Andrey, amd-gfx, robh, gregkh, dri-devel, eric,
	ppaalanen, yuq825, Deucher, Alexander, Wentland, Harry, l.stach

On Tue, Jan 12, 2021 at 9:12 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 11.01.21 um 17:13 schrieb Daniel Vetter:
> > On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> >> Ok then, I guess I will proceed with the dummy pages list implementation then.
> >>
> >> Andrey
> >>
> >> ________________________________
> >> From: Koenig, Christian <Christian.Koenig@amd.com>
> >> Sent: 08 January 2021 09:52
> >> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
> >> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
> >> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> >>
> >> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> >> worked quite hard on reducing the size of the amdgpu_bo, so another
> >> extra pointer just for that corner case would suck quite a bit.
> > We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> > things) which are fairly single-use, so I'm really not much seeing the
> > point in making this a special case. It also means the lifetime management
> > becomes a bit iffy, since we can't throw away the dummy page then the last
> > reference to the bo is released (since we don't track it there), but only
> > when the last pointer to the device is released. Potentially this means a
> > pile of dangling pages hanging around for too long.
>
> Yeah, all of them are already on my TODO list, but see below.
>
> > If you need some ideas for redundant pointers:
> > - destroy callback (kinda not cool to not have this const anyway), we
> >    could refcount it all with the overall gem bo. Quite a bit of work.
>
> The bigger problems is that TTM based drivers are using the destroy
> callback pointer to distinct ghost objects from real ones.
>
> We first need to get rid of those. I already have a plan for that and
> ~20% of it implemented, but it is more complicated because of the driver
> specific backends in Nouveau, Amdgpu and vmwgfx.
>
> > - bdev pointer, if we move the device ttm stuff into struct drm_device, or
> >    create a common struct ttm_device, we can ditch that
>
> Yes, exactly that's what my device structure rename patch set is aiming
> for :)

Hm already on the list and did I miss it?

> > - We could probably merge a few of the fields and find 8 bytes somewhere
>
> Please point out where.

Flags and bool deleted looked compressible at a glance. Not sure
that's worth it.

> > - we still have 2 krefs, would probably need to fix that before we can
> >    merge the destroy callbacks
>
> Yes, already on my TODO list as well. But the last time I looked into
> this I was blocked by the struct_mutex once more.

Uh struct_mutex, I thought we've killed that for good. How is it
getting in the way?

> > So there's plenty of room still, if the size of a bo struct is really that
> > critical. Imo it's not.
>
> It is. See we had a size of struct amdgpu_bo of over 1500 bytes because
> we stopped caring for that, no we are down to 816 at the moment.
>
> We really need to get rid of this duplication of functionality and
> structure between TTM and GEM.

Yeah, and if you have patches nag me, happy to review them anytime really.

Cheers, Daniel

>
> Christian.
>
> > -Daniel
> >
> >
> >> Christian.
> >>
> >> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> >>> Daniel had some objections to this (see bellow) and so I guess I need
> >>> you both to agree on the approach before I proceed.
> >>>
> >>> Andrey
> >>>
> >>> On 1/8/21 9:33 AM, Christian König wrote:
> >>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> >>>>> Hey Christian, just a ping.
> >>>> Was there any question for me here?
> >>>>
> >>>> As far as I can see the best approach would still be to fill the VMA
> >>>> with a single dummy page and avoid pointers in the GEM object.
> >>>>
> >>>> Christian.
> >>>>
> >>>>> Andrey
> >>>>>
> >>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> >>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
> >>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> >>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> >>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> >>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
> >>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> >>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
> >>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> >>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
> >>>>>>>>>>>>>> device is removed.
> >>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
> >>>>>>>>>>>>> something we can do.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We need to find a different approach here.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
> >>>>>>>>>>>>> so they are freed when the device is finally reaped?
> >>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
> >>>>>>>>>>>> this corner case, but why the linking ?
> >>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
> >>>>>>>>>>> I want to avoid keeping the page in the GEM object.
> >>>>>>>>>>>
> >>>>>>>>>>> What we can do is to allocate a page on demand for each fault
> >>>>>>>>>>> and link
> >>>>>>>>>>> the together in the bdev instead.
> >>>>>>>>>>>
> >>>>>>>>>>> And when the bdev is then finally destroyed after the last
> >>>>>>>>>>> application
> >>>>>>>>>>> closed we can finally release all of them.
> >>>>>>>>>>>
> >>>>>>>>>>> Christian.
> >>>>>>>>>> Hey, started to implement this and then realized that by
> >>>>>>>>>> allocating a page
> >>>>>>>>>> for each fault indiscriminately
> >>>>>>>>>> we will be allocating a new page for each faulting virtual
> >>>>>>>>>> address within a
> >>>>>>>>>> VA range belonging the same BO
> >>>>>>>>>> and this is obviously too much and not the intention. Should I
> >>>>>>>>>> instead use
> >>>>>>>>>> let's say a hashtable with the hash
> >>>>>>>>>> key being faulting BO address to actually keep allocating and
> >>>>>>>>>> reusing same
> >>>>>>>>>> dummy zero page per GEM BO
> >>>>>>>>>> (or for that matter DRM file object address for non imported
> >>>>>>>>>> BOs) ?
> >>>>>>>>> Why do we need a hashtable? All the sw structures to track this
> >>>>>>>>> should
> >>>>>>>>> still be around:
> >>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
> >>>>>>>>> a dma-buf,
> >>>>>>>>>      so defensively allocate a per-bo page
> >>>>>>>>> - otherwise allocate a per-file page
> >>>>>>>> That exactly what we have in current implementation
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
> >>>>>>>>> bit like
> >>>>>>>>> over-optimizing stuff. Better to have a simple implementation
> >>>>>>>>> first and
> >>>>>>>>> then tune it if (and only if) any part of it becomes a problem
> >>>>>>>>> for normal
> >>>>>>>>> usage.
> >>>>>>>> Exactly - the idea is to avoid adding extra pointer to
> >>>>>>>> drm_gem_object,
> >>>>>>>> Christian suggested to instead keep a linked list of dummy pages
> >>>>>>>> to be
> >>>>>>>> allocated on demand once we hit a vm_fault. I will then also
> >>>>>>>> prefault the entire
> >>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> >>>>>>>> them
> >>>>>>>> to that single dummy page.
> >>>>>>> This strongly feels like premature optimization. If you're worried
> >>>>>>> about
> >>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
> >>>>>>> redundant
> >>>>>>> pointers between gem and ttm bo structs (I think we still have
> >>>>>>> some) :-)
> >>>>>>>
> >>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> >>>>>>> pointer just because" games with hashtables.
> >>>>>>> -Daniel
> >>>>>>
> >>>>>> Well, if you and Christian can agree on this approach and suggest
> >>>>>> maybe what pointer is
> >>>>>> redundant and can be removed from GEM struct so we can use the
> >>>>>> 'credit' to add the dummy page
> >>>>>> to GEM I will be happy to follow through.
> >>>>>>
> >>>>>> P.S Hash table is off the table anyway and we are talking only
> >>>>>> about linked list here since by prefaulting
> >>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
> >>>>>> page faults to same VMA VA range and so
> >>>>>> don't need to search and reuse an existing dummy page but simply
> >>>>>> create a new one for each next fault.
> >>>>>>
> >>>>>> Andrey
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-12  9:10                                     ` Daniel Vetter
@ 2021-01-12 12:32                                       ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-12 12:32 UTC (permalink / raw)
  To: Daniel Vetter, Andrey Grodzovsky
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Deucher, Alexander, yuq825

Am 12.01.21 um 10:10 schrieb Daniel Vetter:
> On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
>> On 1/11/21 11:15 AM, Daniel Vetter wrote:
>>> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>>>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>>>> Ok then, I guess I will proceed with the dummy pages list implementation then.
>>>>>
>>>>> Andrey
>>>>>
>>>>> ________________________________
>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>> Sent: 08 January 2021 09:52
>>>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
>>>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>>>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>>>
>>>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>>>> extra pointer just for that corner case would suck quite a bit.
>>>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>>>> things) which are fairly single-use, so I'm really not much seeing the
>>>> point in making this a special case. It also means the lifetime management
>>>> becomes a bit iffy, since we can't throw away the dummy page then the last
>>>> reference to the bo is released (since we don't track it there), but only
>>>> when the last pointer to the device is released. Potentially this means a
>>>> pile of dangling pages hanging around for too long.
>>> Also if you really, really, really want to have this list, please don't
>>> reinvent it since we have it already. drmm_ is exactly meant for resources
>>> that should be freed when the final drm_device reference disappears.
>>> -Daniel
>>
>> I maybe was eager to early, see i need to explicitly allocate the dummy page
>> using page_alloc so
>> i cannot use drmm_kmalloc for this, so once again like with the list i need
>> to wrap it with a container struct
>> which i can then allocate using drmm_kmalloc and inside there will be page
>> pointer. But then
>> on release it needs to free the page and so i supposedly need to use drmm_add_action
>> to free the page before the container struct is released but drmm_kmalloc
>> doesn't allow to set
>> release action on struct allocation. So I created a new
>> drmm_kmalloc_with_action API function
>> but then you also need to supply the optional data pointer for the release
>> action (the struct page in this case)
>> and so this all becomes a bit overcomplicated (but doable). Is this extra
>> API worth adding ? Maybe it can
>> be useful in general.
> drm_add_action_or_reset (for better control flow) has both a void * data
> and a cleanup function (and it internally allocates the tracking structure
> for that for you). So should work as-is? Allocating a tracking structure
> for our tracking structure for a page would definitely be a bit too much.
>
> Essentiall drmm_add_action is your kcalloc_with_action function you want,
> as long as all you need is a single void * pointer (we could do the
> kzalloc_with_action though, there's enough space, just no need yet for any
> of the current users).

Yeah, but my thinking was that we should use the page LRU for this and 
not another container structure.

Christian.

> -Daniel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-12 12:32                                       ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-12 12:32 UTC (permalink / raw)
  To: Daniel Vetter, Andrey Grodzovsky
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Deucher, Alexander, l.stach, Wentland, Harry, yuq825

Am 12.01.21 um 10:10 schrieb Daniel Vetter:
> On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
>> On 1/11/21 11:15 AM, Daniel Vetter wrote:
>>> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>>>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>>>> Ok then, I guess I will proceed with the dummy pages list implementation then.
>>>>>
>>>>> Andrey
>>>>>
>>>>> ________________________________
>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>> Sent: 08 January 2021 09:52
>>>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter <daniel@ffwll.ch>
>>>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; Deucher, Alexander <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>>>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>>>
>>>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>>>> extra pointer just for that corner case would suck quite a bit.
>>>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>>>> things) which are fairly single-use, so I'm really not much seeing the
>>>> point in making this a special case. It also means the lifetime management
>>>> becomes a bit iffy, since we can't throw away the dummy page then the last
>>>> reference to the bo is released (since we don't track it there), but only
>>>> when the last pointer to the device is released. Potentially this means a
>>>> pile of dangling pages hanging around for too long.
>>> Also if you really, really, really want to have this list, please don't
>>> reinvent it since we have it already. drmm_ is exactly meant for resources
>>> that should be freed when the final drm_device reference disappears.
>>> -Daniel
>>
>> I maybe was eager to early, see i need to explicitly allocate the dummy page
>> using page_alloc so
>> i cannot use drmm_kmalloc for this, so once again like with the list i need
>> to wrap it with a container struct
>> which i can then allocate using drmm_kmalloc and inside there will be page
>> pointer. But then
>> on release it needs to free the page and so i supposedly need to use drmm_add_action
>> to free the page before the container struct is released but drmm_kmalloc
>> doesn't allow to set
>> release action on struct allocation. So I created a new
>> drmm_kmalloc_with_action API function
>> but then you also need to supply the optional data pointer for the release
>> action (the struct page in this case)
>> and so this all becomes a bit overcomplicated (but doable). Is this extra
>> API worth adding ? Maybe it can
>> be useful in general.
> drm_add_action_or_reset (for better control flow) has both a void * data
> and a cleanup function (and it internally allocates the tracking structure
> for that for you). So should work as-is? Allocating a tracking structure
> for our tracking structure for a page would definitely be a bit too much.
>
> Essentiall drmm_add_action is your kcalloc_with_action function you want,
> as long as all you need is a single void * pointer (we could do the
> kzalloc_with_action though, there's enough space, just no need yet for any
> of the current users).

Yeah, but my thinking was that we should use the page LRU for this and 
not another container structure.

Christian.

> -Daniel

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-12  9:10                                     ` Daniel Vetter
@ 2021-01-12 15:54                                       ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-12 15:54 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Deucher, Alexander,
	Koenig, Christian, yuq825

So - basically allocate the page and pass it as void* pointer to drmm_add_action
with a release function which will do the free page, right ?

Andrey

On 1/12/21 4:10 AM, Daniel Vetter wrote:
> drm_add_action_or_reset (for better control flow) has both a void * data
> and a cleanup function (and it internally allocates the tracking structure
> for that for you). So should work as-is? Allocating a tracking structure
> for our tracking structure for a page would definitely be a bit too much.
>
> Essentiall drmm_add_action is your kcalloc_with_action function you want,
> as long as all you need is a single void * pointer (we could do the
> kzalloc_with_action though, there's enough space, just no need yet for any
> of the current users).
> -Daniel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-12 15:54                                       ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-12 15:54 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Deucher, Alexander, l.stach, Wentland, Harry, Koenig, Christian,
	yuq825

So - basically allocate the page and pass it as void* pointer to drmm_add_action
with a release function which will do the free page, right ?

Andrey

On 1/12/21 4:10 AM, Daniel Vetter wrote:
> drm_add_action_or_reset (for better control flow) has both a void * data
> and a cleanup function (and it internally allocates the tracking structure
> for that for you). So should work as-is? Allocating a tracking structure
> for our tracking structure for a page would definitely be a bit too much.
>
> Essentiall drmm_add_action is your kcalloc_with_action function you want,
> as long as all you need is a single void * pointer (we could do the
> kzalloc_with_action though, there's enough space, just no need yet for any
> of the current users).
> -Daniel
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-12 12:32                                       ` Christian König
@ 2021-01-12 15:59                                         ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-12 15:59 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Deucher, Alexander, yuq825


On 1/12/21 7:32 AM, Christian König wrote:
> Am 12.01.21 um 10:10 schrieb Daniel Vetter:
>> On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
>>> On 1/11/21 11:15 AM, Daniel Vetter wrote:
>>>> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>>>>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>>>>> Ok then, I guess I will proceed with the dummy pages list implementation 
>>>>>> then.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> ________________________________
>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>>> Sent: 08 January 2021 09:52
>>>>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter 
>>>>>> <daniel@ffwll.ch>
>>>>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; 
>>>>>> dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; 
>>>>>> daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org 
>>>>>> <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; 
>>>>>> yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; 
>>>>>> Deucher, Alexander <Alexander.Deucher@amd.com>; 
>>>>>> gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; 
>>>>>> ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry 
>>>>>> <Harry.Wentland@amd.com>
>>>>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>>>>
>>>>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>>>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>>>>> extra pointer just for that corner case would suck quite a bit.
>>>>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>>>>> things) which are fairly single-use, so I'm really not much seeing the
>>>>> point in making this a special case. It also means the lifetime management
>>>>> becomes a bit iffy, since we can't throw away the dummy page then the last
>>>>> reference to the bo is released (since we don't track it there), but only
>>>>> when the last pointer to the device is released. Potentially this means a
>>>>> pile of dangling pages hanging around for too long.
>>>> Also if you really, really, really want to have this list, please don't
>>>> reinvent it since we have it already. drmm_ is exactly meant for resources
>>>> that should be freed when the final drm_device reference disappears.
>>>> -Daniel
>>>
>>> I maybe was eager to early, see i need to explicitly allocate the dummy page
>>> using page_alloc so
>>> i cannot use drmm_kmalloc for this, so once again like with the list i need
>>> to wrap it with a container struct
>>> which i can then allocate using drmm_kmalloc and inside there will be page
>>> pointer. But then
>>> on release it needs to free the page and so i supposedly need to use 
>>> drmm_add_action
>>> to free the page before the container struct is released but drmm_kmalloc
>>> doesn't allow to set
>>> release action on struct allocation. So I created a new
>>> drmm_kmalloc_with_action API function
>>> but then you also need to supply the optional data pointer for the release
>>> action (the struct page in this case)
>>> and so this all becomes a bit overcomplicated (but doable). Is this extra
>>> API worth adding ? Maybe it can
>>> be useful in general.
>> drm_add_action_or_reset (for better control flow) has both a void * data
>> and a cleanup function (and it internally allocates the tracking structure
>> for that for you). So should work as-is? Allocating a tracking structure
>> for our tracking structure for a page would definitely be a bit too much.
>>
>> Essentiall drmm_add_action is your kcalloc_with_action function you want,
>> as long as all you need is a single void * pointer (we could do the
>> kzalloc_with_action though, there's enough space, just no need yet for any
>> of the current users).
>
> Yeah, but my thinking was that we should use the page LRU for this and not 
> another container structure.
>
> Christian.


Which specific list did you mean ?

Andrey


>
>> -Daniel
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-12 15:59                                         ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-12 15:59 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Deucher, Alexander, l.stach, Wentland, Harry, yuq825


On 1/12/21 7:32 AM, Christian König wrote:
> Am 12.01.21 um 10:10 schrieb Daniel Vetter:
>> On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
>>> On 1/11/21 11:15 AM, Daniel Vetter wrote:
>>>> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>>>>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>>>>> Ok then, I guess I will proceed with the dummy pages list implementation 
>>>>>> then.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>> ________________________________
>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>>> Sent: 08 January 2021 09:52
>>>>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter 
>>>>>> <daniel@ffwll.ch>
>>>>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; 
>>>>>> dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; 
>>>>>> daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org 
>>>>>> <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; 
>>>>>> yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; 
>>>>>> Deucher, Alexander <Alexander.Deucher@amd.com>; 
>>>>>> gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; 
>>>>>> ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry 
>>>>>> <Harry.Wentland@amd.com>
>>>>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>>>>
>>>>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>>>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>>>>> extra pointer just for that corner case would suck quite a bit.
>>>>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>>>>> things) which are fairly single-use, so I'm really not much seeing the
>>>>> point in making this a special case. It also means the lifetime management
>>>>> becomes a bit iffy, since we can't throw away the dummy page then the last
>>>>> reference to the bo is released (since we don't track it there), but only
>>>>> when the last pointer to the device is released. Potentially this means a
>>>>> pile of dangling pages hanging around for too long.
>>>> Also if you really, really, really want to have this list, please don't
>>>> reinvent it since we have it already. drmm_ is exactly meant for resources
>>>> that should be freed when the final drm_device reference disappears.
>>>> -Daniel
>>>
>>> I maybe was eager to early, see i need to explicitly allocate the dummy page
>>> using page_alloc so
>>> i cannot use drmm_kmalloc for this, so once again like with the list i need
>>> to wrap it with a container struct
>>> which i can then allocate using drmm_kmalloc and inside there will be page
>>> pointer. But then
>>> on release it needs to free the page and so i supposedly need to use 
>>> drmm_add_action
>>> to free the page before the container struct is released but drmm_kmalloc
>>> doesn't allow to set
>>> release action on struct allocation. So I created a new
>>> drmm_kmalloc_with_action API function
>>> but then you also need to supply the optional data pointer for the release
>>> action (the struct page in this case)
>>> and so this all becomes a bit overcomplicated (but doable). Is this extra
>>> API worth adding ? Maybe it can
>>> be useful in general.
>> drm_add_action_or_reset (for better control flow) has both a void * data
>> and a cleanup function (and it internally allocates the tracking structure
>> for that for you). So should work as-is? Allocating a tracking structure
>> for our tracking structure for a page would definitely be a bit too much.
>>
>> Essentiall drmm_add_action is your kcalloc_with_action function you want,
>> as long as all you need is a single void * pointer (we could do the
>> kzalloc_with_action though, there's enough space, just no need yet for any
>> of the current users).
>
> Yeah, but my thinking was that we should use the page LRU for this and not 
> another container structure.
>
> Christian.


Which specific list did you mean ?

Andrey


>
>> -Daniel
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-12 15:59                                         ` Andrey Grodzovsky
@ 2021-01-13  9:14                                           ` Christian König
  -1 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-13  9:14 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Deucher, Alexander, yuq825

Am 12.01.21 um 16:59 schrieb Andrey Grodzovsky:
>
> On 1/12/21 7:32 AM, Christian König wrote:
>> Am 12.01.21 um 10:10 schrieb Daniel Vetter:
>>> On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
>>>> On 1/11/21 11:15 AM, Daniel Vetter wrote:
>>>>> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>>>>>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>>>>>> Ok then, I guess I will proceed with the dummy pages list 
>>>>>>> implementation then.
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>> ________________________________
>>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>> Sent: 08 January 2021 09:52
>>>>>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel 
>>>>>>> Vetter <daniel@ffwll.ch>
>>>>>>> Cc: amd-gfx@lists.freedesktop.org 
>>>>>>> <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org 
>>>>>>> <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch 
>>>>>>> <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; 
>>>>>>> l.stach@pengutronix.de <l.stach@pengutronix.de>; 
>>>>>>> yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net 
>>>>>>> <eric@anholt.net>; Deucher, Alexander 
>>>>>>> <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org 
>>>>>>> <gregkh@linuxfoundation.org>; ppaalanen@gmail.com 
>>>>>>> <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>>>>>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or 
>>>>>>> GEM object
>>>>>>>
>>>>>>> Mhm, I'm not aware of any let over pointer between TTM and GEM 
>>>>>>> and we
>>>>>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>>>>>> extra pointer just for that corner case would suck quite a bit.
>>>>>> We have a ton of other pointers in struct amdgpu_bo (or any of 
>>>>>> it's lower
>>>>>> things) which are fairly single-use, so I'm really not much 
>>>>>> seeing the
>>>>>> point in making this a special case. It also means the lifetime 
>>>>>> management
>>>>>> becomes a bit iffy, since we can't throw away the dummy page then 
>>>>>> the last
>>>>>> reference to the bo is released (since we don't track it there), 
>>>>>> but only
>>>>>> when the last pointer to the device is released. Potentially this 
>>>>>> means a
>>>>>> pile of dangling pages hanging around for too long.
>>>>> Also if you really, really, really want to have this list, please 
>>>>> don't
>>>>> reinvent it since we have it already. drmm_ is exactly meant for 
>>>>> resources
>>>>> that should be freed when the final drm_device reference disappears.
>>>>> -Daniel
>>>>
>>>> I maybe was eager to early, see i need to explicitly allocate the 
>>>> dummy page
>>>> using page_alloc so
>>>> i cannot use drmm_kmalloc for this, so once again like with the 
>>>> list i need
>>>> to wrap it with a container struct
>>>> which i can then allocate using drmm_kmalloc and inside there will 
>>>> be page
>>>> pointer. But then
>>>> on release it needs to free the page and so i supposedly need to 
>>>> use drmm_add_action
>>>> to free the page before the container struct is released but 
>>>> drmm_kmalloc
>>>> doesn't allow to set
>>>> release action on struct allocation. So I created a new
>>>> drmm_kmalloc_with_action API function
>>>> but then you also need to supply the optional data pointer for the 
>>>> release
>>>> action (the struct page in this case)
>>>> and so this all becomes a bit overcomplicated (but doable). Is this 
>>>> extra
>>>> API worth adding ? Maybe it can
>>>> be useful in general.
>>> drm_add_action_or_reset (for better control flow) has both a void * 
>>> data
>>> and a cleanup function (and it internally allocates the tracking 
>>> structure
>>> for that for you). So should work as-is? Allocating a tracking 
>>> structure
>>> for our tracking structure for a page would definitely be a bit too 
>>> much.
>>>
>>> Essentiall drmm_add_action is your kcalloc_with_action function you 
>>> want,
>>> as long as all you need is a single void * pointer (we could do the
>>> kzalloc_with_action though, there's enough space, just no need yet 
>>> for any
>>> of the current users).
>>
>> Yeah, but my thinking was that we should use the page LRU for this 
>> and not another container structure.
>>
>> Christian.
>
>
> Which specific list did you mean ?

The struct page * you get from get_free_page() already has an lru member 
of type list_head.

This way you can link pages together for later destruction without the 
need of a container object.

Christian.

>
> Andrey
>
>
>>
>>> -Daniel
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-13  9:14                                           ` Christian König
  0 siblings, 0 replies; 212+ messages in thread
From: Christian König @ 2021-01-13  9:14 UTC (permalink / raw)
  To: Andrey Grodzovsky, Daniel Vetter
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Deucher, Alexander, l.stach, Wentland, Harry, yuq825

Am 12.01.21 um 16:59 schrieb Andrey Grodzovsky:
>
> On 1/12/21 7:32 AM, Christian König wrote:
>> Am 12.01.21 um 10:10 schrieb Daniel Vetter:
>>> On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
>>>> On 1/11/21 11:15 AM, Daniel Vetter wrote:
>>>>> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>>>>>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>>>>>> Ok then, I guess I will proceed with the dummy pages list 
>>>>>>> implementation then.
>>>>>>>
>>>>>>> Andrey
>>>>>>>
>>>>>>> ________________________________
>>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>> Sent: 08 January 2021 09:52
>>>>>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel 
>>>>>>> Vetter <daniel@ffwll.ch>
>>>>>>> Cc: amd-gfx@lists.freedesktop.org 
>>>>>>> <amd-gfx@lists.freedesktop.org>; dri-devel@lists.freedesktop.org 
>>>>>>> <dri-devel@lists.freedesktop.org>; daniel.vetter@ffwll.ch 
>>>>>>> <daniel.vetter@ffwll.ch>; robh@kernel.org <robh@kernel.org>; 
>>>>>>> l.stach@pengutronix.de <l.stach@pengutronix.de>; 
>>>>>>> yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net 
>>>>>>> <eric@anholt.net>; Deucher, Alexander 
>>>>>>> <Alexander.Deucher@amd.com>; gregkh@linuxfoundation.org 
>>>>>>> <gregkh@linuxfoundation.org>; ppaalanen@gmail.com 
>>>>>>> <ppaalanen@gmail.com>; Wentland, Harry <Harry.Wentland@amd.com>
>>>>>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or 
>>>>>>> GEM object
>>>>>>>
>>>>>>> Mhm, I'm not aware of any let over pointer between TTM and GEM 
>>>>>>> and we
>>>>>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>>>>>> extra pointer just for that corner case would suck quite a bit.
>>>>>> We have a ton of other pointers in struct amdgpu_bo (or any of 
>>>>>> it's lower
>>>>>> things) which are fairly single-use, so I'm really not much 
>>>>>> seeing the
>>>>>> point in making this a special case. It also means the lifetime 
>>>>>> management
>>>>>> becomes a bit iffy, since we can't throw away the dummy page then 
>>>>>> the last
>>>>>> reference to the bo is released (since we don't track it there), 
>>>>>> but only
>>>>>> when the last pointer to the device is released. Potentially this 
>>>>>> means a
>>>>>> pile of dangling pages hanging around for too long.
>>>>> Also if you really, really, really want to have this list, please 
>>>>> don't
>>>>> reinvent it since we have it already. drmm_ is exactly meant for 
>>>>> resources
>>>>> that should be freed when the final drm_device reference disappears.
>>>>> -Daniel
>>>>
>>>> I maybe was eager to early, see i need to explicitly allocate the 
>>>> dummy page
>>>> using page_alloc so
>>>> i cannot use drmm_kmalloc for this, so once again like with the 
>>>> list i need
>>>> to wrap it with a container struct
>>>> which i can then allocate using drmm_kmalloc and inside there will 
>>>> be page
>>>> pointer. But then
>>>> on release it needs to free the page and so i supposedly need to 
>>>> use drmm_add_action
>>>> to free the page before the container struct is released but 
>>>> drmm_kmalloc
>>>> doesn't allow to set
>>>> release action on struct allocation. So I created a new
>>>> drmm_kmalloc_with_action API function
>>>> but then you also need to supply the optional data pointer for the 
>>>> release
>>>> action (the struct page in this case)
>>>> and so this all becomes a bit overcomplicated (but doable). Is this 
>>>> extra
>>>> API worth adding ? Maybe it can
>>>> be useful in general.
>>> drm_add_action_or_reset (for better control flow) has both a void * 
>>> data
>>> and a cleanup function (and it internally allocates the tracking 
>>> structure
>>> for that for you). So should work as-is? Allocating a tracking 
>>> structure
>>> for our tracking structure for a page would definitely be a bit too 
>>> much.
>>>
>>> Essentiall drmm_add_action is your kcalloc_with_action function you 
>>> want,
>>> as long as all you need is a single void * pointer (we could do the
>>> kzalloc_with_action though, there's enough space, just no need yet 
>>> for any
>>> of the current users).
>>
>> Yeah, but my thinking was that we should use the page LRU for this 
>> and not another container structure.
>>
>> Christian.
>
>
> Which specific list did you mean ?

The struct page * you get from get_free_page() already has an lru member 
of type list_head.

This way you can link pages together for later destruction without the 
need of a container object.

Christian.

>
> Andrey
>
>
>>
>>> -Daniel
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
  2021-01-13  9:14                                           ` Christian König
@ 2021-01-13 14:40                                             ` Andrey Grodzovsky
  -1 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-13 14:40 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: daniel.vetter, amd-gfx, dri-devel, gregkh, Deucher, Alexander, yuq825


On 1/13/21 4:14 AM, Christian König wrote:
> Am 12.01.21 um 16:59 schrieb Andrey Grodzovsky:
>>
>> On 1/12/21 7:32 AM, Christian König wrote:
>>> Am 12.01.21 um 10:10 schrieb Daniel Vetter:
>>>> On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
>>>>> On 1/11/21 11:15 AM, Daniel Vetter wrote:
>>>>>> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>>>>>>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>>>>>>> Ok then, I guess I will proceed with the dummy pages list 
>>>>>>>> implementation then.
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>> ________________________________
>>>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>>> Sent: 08 January 2021 09:52
>>>>>>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter 
>>>>>>>> <daniel@ffwll.ch>
>>>>>>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; 
>>>>>>>> dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; 
>>>>>>>> daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org 
>>>>>>>> <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; 
>>>>>>>> yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; 
>>>>>>>> Deucher, Alexander <Alexander.Deucher@amd.com>; 
>>>>>>>> gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; 
>>>>>>>> ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry 
>>>>>>>> <Harry.Wentland@amd.com>
>>>>>>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>>>>>>
>>>>>>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>>>>>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>>>>>>> extra pointer just for that corner case would suck quite a bit.
>>>>>>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>>>>>>> things) which are fairly single-use, so I'm really not much seeing the
>>>>>>> point in making this a special case. It also means the lifetime management
>>>>>>> becomes a bit iffy, since we can't throw away the dummy page then the last
>>>>>>> reference to the bo is released (since we don't track it there), but only
>>>>>>> when the last pointer to the device is released. Potentially this means a
>>>>>>> pile of dangling pages hanging around for too long.
>>>>>> Also if you really, really, really want to have this list, please don't
>>>>>> reinvent it since we have it already. drmm_ is exactly meant for resources
>>>>>> that should be freed when the final drm_device reference disappears.
>>>>>> -Daniel
>>>>>
>>>>> I maybe was eager to early, see i need to explicitly allocate the dummy page
>>>>> using page_alloc so
>>>>> i cannot use drmm_kmalloc for this, so once again like with the list i need
>>>>> to wrap it with a container struct
>>>>> which i can then allocate using drmm_kmalloc and inside there will be page
>>>>> pointer. But then
>>>>> on release it needs to free the page and so i supposedly need to use 
>>>>> drmm_add_action
>>>>> to free the page before the container struct is released but drmm_kmalloc
>>>>> doesn't allow to set
>>>>> release action on struct allocation. So I created a new
>>>>> drmm_kmalloc_with_action API function
>>>>> but then you also need to supply the optional data pointer for the release
>>>>> action (the struct page in this case)
>>>>> and so this all becomes a bit overcomplicated (but doable). Is this extra
>>>>> API worth adding ? Maybe it can
>>>>> be useful in general.
>>>> drm_add_action_or_reset (for better control flow) has both a void * data
>>>> and a cleanup function (and it internally allocates the tracking structure
>>>> for that for you). So should work as-is? Allocating a tracking structure
>>>> for our tracking structure for a page would definitely be a bit too much.
>>>>
>>>> Essentiall drmm_add_action is your kcalloc_with_action function you want,
>>>> as long as all you need is a single void * pointer (we could do the
>>>> kzalloc_with_action though, there's enough space, just no need yet for any
>>>> of the current users).
>>>
>>> Yeah, but my thinking was that we should use the page LRU for this and not 
>>> another container structure.
>>>
>>> Christian.
>>
>>
>> Which specific list did you mean ?
>
> The struct page * you get from get_free_page() already has an lru member of 
> type list_head.
>
> This way you can link pages together for later destruction without the need of 
> a container object.
>
> Christian.


I get it now, this is a good advise, and indeed makes the container struct i 
created obsolete but, currently I am going
with Daniel's suggestion to use drm_add_action_or_reset which makes the list 
itself also unneeded.

Andrey


>
>>
>> Andrey
>>
>>
>>>
>>>> -Daniel
>>>
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 212+ messages in thread

* Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
@ 2021-01-13 14:40                                             ` Andrey Grodzovsky
  0 siblings, 0 replies; 212+ messages in thread
From: Andrey Grodzovsky @ 2021-01-13 14:40 UTC (permalink / raw)
  To: Christian König, Daniel Vetter
  Cc: robh, daniel.vetter, amd-gfx, eric, ppaalanen, dri-devel, gregkh,
	Deucher, Alexander, l.stach, Wentland, Harry, yuq825


On 1/13/21 4:14 AM, Christian König wrote:
> Am 12.01.21 um 16:59 schrieb Andrey Grodzovsky:
>>
>> On 1/12/21 7:32 AM, Christian König wrote:
>>> Am 12.01.21 um 10:10 schrieb Daniel Vetter:
>>>> On Mon, Jan 11, 2021 at 03:45:10PM -0500, Andrey Grodzovsky wrote:
>>>>> On 1/11/21 11:15 AM, Daniel Vetter wrote:
>>>>>> On Mon, Jan 11, 2021 at 05:13:56PM +0100, Daniel Vetter wrote:
>>>>>>> On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
>>>>>>>> Ok then, I guess I will proceed with the dummy pages list 
>>>>>>>> implementation then.
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>> ________________________________
>>>>>>>> From: Koenig, Christian <Christian.Koenig@amd.com>
>>>>>>>> Sent: 08 January 2021 09:52
>>>>>>>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@amd.com>; Daniel Vetter 
>>>>>>>> <daniel@ffwll.ch>
>>>>>>>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>; 
>>>>>>>> dri-devel@lists.freedesktop.org <dri-devel@lists.freedesktop.org>; 
>>>>>>>> daniel.vetter@ffwll.ch <daniel.vetter@ffwll.ch>; robh@kernel.org 
>>>>>>>> <robh@kernel.org>; l.stach@pengutronix.de <l.stach@pengutronix.de>; 
>>>>>>>> yuq825@gmail.com <yuq825@gmail.com>; eric@anholt.net <eric@anholt.net>; 
>>>>>>>> Deucher, Alexander <Alexander.Deucher@amd.com>; 
>>>>>>>> gregkh@linuxfoundation.org <gregkh@linuxfoundation.org>; 
>>>>>>>> ppaalanen@gmail.com <ppaalanen@gmail.com>; Wentland, Harry 
>>>>>>>> <Harry.Wentland@amd.com>
>>>>>>>> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
>>>>>>>>
>>>>>>>> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
>>>>>>>> worked quite hard on reducing the size of the amdgpu_bo, so another
>>>>>>>> extra pointer just for that corner case would suck quite a bit.
>>>>>>> We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
>>>>>>> things) which are fairly single-use, so I'm really not much seeing the
>>>>>>> point in making this a special case. It also means the lifetime management
>>>>>>> becomes a bit iffy, since we can't throw away the dummy page then the last
>>>>>>> reference to the bo is released (since we don't track it there), but only
>>>>>>> when the last pointer to the device is released. Potentially this means a
>>>>>>> pile of dangling pages hanging around for too long.
>>>>>> Also if you really, really, really want to have this list, please don't
>>>>>> reinvent it since we have it already. drmm_ is exactly meant for resources
>>>>>> that should be freed when the final drm_device reference disappears.
>>>>>> -Daniel
>>>>>
>>>>> I maybe was eager to early, see i need to explicitly allocate the dummy page
>>>>> using page_alloc so
>>>>> i cannot use drmm_kmalloc for this, so once again like with the list i need
>>>>> to wrap it with a container struct
>>>>> which i can then allocate using drmm_kmalloc and inside there will be page
>>>>> pointer. But then
>>>>> on release it needs to free the page and so i supposedly need to use 
>>>>> drmm_add_action
>>>>> to free the page before the container struct is released but drmm_kmalloc
>>>>> doesn't allow to set
>>>>> release action on struct allocation. So I created a new
>>>>> drmm_kmalloc_with_action API function
>>>>> but then you also need to supply the optional data pointer for the release
>>>>> action (the struct page in this case)
>>>>> and so this all becomes a bit overcomplicated (but doable). Is this extra
>>>>> API worth adding ? Maybe it can
>>>>> be useful in general.
>>>> drm_add_action_or_reset (for better control flow) has both a void * data
>>>> and a cleanup function (and it internally allocates the tracking structure
>>>> for that for you). So should work as-is? Allocating a tracking structure
>>>> for our tracking structure for a page would definitely be a bit too much.
>>>>
>>>> Essentiall drmm_add_action is your kcalloc_with_action function you want,
>>>> as long as all you need is a single void * pointer (we could do the
>>>> kzalloc_with_action though, there's enough space, just no need yet for any
>>>> of the current users).
>>>
>>> Yeah, but my thinking was that we should use the page LRU for this and not 
>>> another container structure.
>>>
>>> Christian.
>>
>>
>> Which specific list did you mean ?
>
> The struct page * you get from get_free_page() already has an lru member of 
> type list_head.
>
> This way you can link pages together for later destruction without the need of 
> a container object.
>
> Christian.


I get it now, this is a good advise, and indeed makes the container struct i 
created obsolete but, currently I am going
with Daniel's suggestion to use drm_add_action_or_reset which makes the list 
itself also unneeded.

Andrey


>
>>
>> Andrey
>>
>>
>>>
>>>> -Daniel
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 212+ messages in thread

end of thread, other threads:[~2021-01-13 14:40 UTC | newest]

Thread overview: 212+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-21  5:21 [PATCH v3 00/12] RFC Support hot device unplug in amdgpu Andrey Grodzovsky
2020-11-21  5:21 ` Andrey Grodzovsky
2020-11-21  5:21 ` [PATCH v3 01/12] drm: Add dummy page per device or GEM object Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-21 14:15   ` Christian König
2020-11-21 14:15     ` Christian König
2020-11-23  4:54     ` Andrey Grodzovsky
2020-11-23  4:54       ` Andrey Grodzovsky
2020-11-23  8:01       ` Christian König
2020-11-23  8:01         ` Christian König
2021-01-05 21:04         ` Andrey Grodzovsky
2021-01-05 21:04           ` Andrey Grodzovsky
2021-01-07 16:21           ` Daniel Vetter
2021-01-07 16:21             ` Daniel Vetter
2021-01-07 16:26             ` Andrey Grodzovsky
2021-01-07 16:26               ` Andrey Grodzovsky
2021-01-07 16:28               ` Andrey Grodzovsky
2021-01-07 16:28                 ` Andrey Grodzovsky
2021-01-07 16:30               ` Daniel Vetter
2021-01-07 16:30                 ` Daniel Vetter
2021-01-07 16:37                 ` Andrey Grodzovsky
2021-01-07 16:37                   ` Andrey Grodzovsky
2021-01-08 14:26                   ` Andrey Grodzovsky
2021-01-08 14:26                     ` Andrey Grodzovsky
2021-01-08 14:33                     ` Christian König
2021-01-08 14:33                       ` Christian König
2021-01-08 14:46                       ` Andrey Grodzovsky
2021-01-08 14:46                         ` Andrey Grodzovsky
2021-01-08 14:52                         ` Christian König
2021-01-08 14:52                           ` Christian König
2021-01-08 16:49                           ` Grodzovsky, Andrey
2021-01-08 16:49                             ` Grodzovsky, Andrey
2021-01-11 16:13                             ` Daniel Vetter
2021-01-11 16:13                               ` Daniel Vetter
2021-01-11 16:15                               ` Daniel Vetter
2021-01-11 16:15                                 ` Daniel Vetter
2021-01-11 17:41                                 ` Andrey Grodzovsky
2021-01-11 17:41                                   ` Andrey Grodzovsky
2021-01-11 18:31                                   ` Andrey Grodzovsky
2021-01-12  9:07                                     ` Daniel Vetter
2021-01-11 20:45                                 ` Andrey Grodzovsky
2021-01-11 20:45                                   ` Andrey Grodzovsky
2021-01-12  9:10                                   ` Daniel Vetter
2021-01-12  9:10                                     ` Daniel Vetter
2021-01-12 12:32                                     ` Christian König
2021-01-12 12:32                                       ` Christian König
2021-01-12 15:59                                       ` Andrey Grodzovsky
2021-01-12 15:59                                         ` Andrey Grodzovsky
2021-01-13  9:14                                         ` Christian König
2021-01-13  9:14                                           ` Christian König
2021-01-13 14:40                                           ` Andrey Grodzovsky
2021-01-13 14:40                                             ` Andrey Grodzovsky
2021-01-12 15:54                                     ` Andrey Grodzovsky
2021-01-12 15:54                                       ` Andrey Grodzovsky
2021-01-12  8:12                               ` Christian König
2021-01-12  8:12                                 ` Christian König
2021-01-12  9:13                                 ` Daniel Vetter
2021-01-12  9:13                                   ` Daniel Vetter
2020-11-21  5:21 ` [PATCH v3 02/12] drm: Unamp the entire device address space on device unplug Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-21 14:16   ` Christian König
2020-11-21 14:16     ` Christian König
2020-11-24 14:44     ` Daniel Vetter
2020-11-24 14:44       ` Daniel Vetter
2020-11-21  5:21 ` [PATCH v3 03/12] drm/ttm: Remap all page faults to per process dummy page Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-21  5:21 ` [PATCH v3 04/12] drm/ttm: Set dma addr to null after freee Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-21 14:13   ` Christian König
2020-11-21 14:13     ` Christian König
2020-11-23  5:15     ` Andrey Grodzovsky
2020-11-23  5:15       ` Andrey Grodzovsky
2020-11-23  8:04       ` Christian König
2020-11-23  8:04         ` Christian König
2020-11-21  5:21 ` [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-25 10:42   ` Christian König
2020-11-25 10:42     ` Christian König
2020-11-23 20:05     ` Andrey Grodzovsky
2020-11-23 20:05       ` Andrey Grodzovsky
2020-11-23 20:20       ` Christian König
2020-11-23 20:20         ` Christian König
2020-11-23 20:38         ` Andrey Grodzovsky
2020-11-23 20:38           ` Andrey Grodzovsky
2020-11-23 20:41           ` Christian König
2020-11-23 20:41             ` Christian König
2020-11-23 21:08             ` Andrey Grodzovsky
2020-11-23 21:08               ` Andrey Grodzovsky
2020-11-24  7:41               ` Christian König
2020-11-24  7:41                 ` Christian König
2020-11-24 16:22                 ` Andrey Grodzovsky
2020-11-24 16:22                   ` Andrey Grodzovsky
2020-11-24 16:44                   ` Christian König
2020-11-24 16:44                     ` Christian König
2020-11-25 10:40                     ` Daniel Vetter
2020-11-25 10:40                       ` Daniel Vetter
2020-11-25 12:57                       ` Christian König
2020-11-25 12:57                         ` Christian König
2020-11-25 16:36                         ` Daniel Vetter
2020-11-25 16:36                           ` Daniel Vetter
2020-11-25 19:34                           ` Andrey Grodzovsky
2020-11-25 19:34                             ` Andrey Grodzovsky
2020-11-27 13:10                             ` Grodzovsky, Andrey
2020-11-27 13:10                               ` Grodzovsky, Andrey
2020-11-27 14:59                             ` Daniel Vetter
2020-11-27 14:59                               ` Daniel Vetter
2020-11-27 16:04                               ` Andrey Grodzovsky
2020-11-27 16:04                                 ` Andrey Grodzovsky
2020-11-30 14:15                                 ` Daniel Vetter
2020-11-30 14:15                                   ` Daniel Vetter
2020-11-25 16:56                         ` Michel Dänzer
2020-11-25 16:56                           ` Michel Dänzer
2020-11-25 17:02                           ` Daniel Vetter
2020-11-25 17:02                             ` Daniel Vetter
2020-12-15 20:18                     ` Andrey Grodzovsky
2020-12-15 20:18                       ` Andrey Grodzovsky
2020-12-16  8:04                       ` Christian König
2020-12-16  8:04                         ` Christian König
2020-12-16 14:21                         ` Daniel Vetter
2020-12-16 14:21                           ` Daniel Vetter
2020-12-16 16:13                           ` Andrey Grodzovsky
2020-12-16 16:13                             ` Andrey Grodzovsky
2020-12-16 16:18                             ` Christian König
2020-12-16 16:18                               ` Christian König
2020-12-16 17:12                               ` Daniel Vetter
2020-12-16 17:12                                 ` Daniel Vetter
2020-12-16 17:20                                 ` Daniel Vetter
2020-12-16 17:20                                   ` Daniel Vetter
2020-12-16 18:26                                 ` Andrey Grodzovsky
2020-12-16 18:26                                   ` Andrey Grodzovsky
2020-12-16 23:15                                   ` Daniel Vetter
2020-12-16 23:15                                     ` Daniel Vetter
2020-12-17  0:20                                     ` Andrey Grodzovsky
2020-12-17  0:20                                       ` Andrey Grodzovsky
2020-12-17 12:01                                       ` Daniel Vetter
2020-12-17 12:01                                         ` Daniel Vetter
2020-12-17 19:19                                         ` Andrey Grodzovsky
2020-12-17 19:19                                           ` Andrey Grodzovsky
2020-12-17 20:10                                           ` Christian König
2020-12-17 20:10                                             ` Christian König
2020-12-17 20:38                                             ` Andrey Grodzovsky
2020-12-17 20:38                                               ` Andrey Grodzovsky
2020-12-17 20:48                                               ` Daniel Vetter
2020-12-17 20:48                                                 ` Daniel Vetter
2020-12-17 21:06                                                 ` Andrey Grodzovsky
2020-12-17 21:06                                                   ` Andrey Grodzovsky
2020-12-18 14:30                                                   ` Daniel Vetter
2020-12-18 14:30                                                     ` Daniel Vetter
2020-12-17 20:42                                           ` Daniel Vetter
2020-12-17 20:42                                             ` Daniel Vetter
2020-12-17 21:13                                             ` Andrey Grodzovsky
2020-12-17 21:13                                               ` Andrey Grodzovsky
2021-01-04 16:33                                               ` Andrey Grodzovsky
2021-01-04 16:33                                                 ` Andrey Grodzovsky
2020-11-21  5:21 ` [PATCH v3 06/12] drm/sched: Cancel and flush all oustatdning jobs before finish Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-22 11:56   ` Christian König
2020-11-22 11:56     ` Christian König
2020-11-21  5:21 ` [PATCH v3 07/12] drm/sched: Prevent any job recoveries after device is unplugged Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-22 11:57   ` Christian König
2020-11-22 11:57     ` Christian König
2020-11-23  5:37     ` Andrey Grodzovsky
2020-11-23  5:37       ` Andrey Grodzovsky
2020-11-23  8:06       ` Christian König
2020-11-23  8:06         ` Christian König
2020-11-24  1:12         ` Luben Tuikov
2020-11-24  1:12           ` Luben Tuikov
2020-11-24  7:50           ` Christian König
2020-11-24  7:50             ` Christian König
2020-11-24 17:11             ` Luben Tuikov
2020-11-24 17:11               ` Luben Tuikov
2020-11-24 17:17               ` Andrey Grodzovsky
2020-11-24 17:17                 ` Andrey Grodzovsky
2020-11-24 17:41                 ` Luben Tuikov
2020-11-24 17:41                   ` Luben Tuikov
2020-11-24 17:40               ` Christian König
2020-11-24 17:40                 ` Christian König
2020-11-24 17:44                 ` Luben Tuikov
2020-11-24 17:44                   ` Luben Tuikov
2020-11-21  5:21 ` [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-24 14:53   ` Daniel Vetter
2020-11-24 14:53     ` Daniel Vetter
2020-11-24 15:51     ` Andrey Grodzovsky
2020-11-24 15:51       ` Andrey Grodzovsky
2020-11-25 10:41       ` Daniel Vetter
2020-11-25 10:41         ` Daniel Vetter
2020-11-25 17:41         ` Andrey Grodzovsky
2020-11-25 17:41           ` Andrey Grodzovsky
2020-11-21  5:21 ` [PATCH v3 09/12] drm/amdgpu: Add early fini callback Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-21  5:21 ` [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-24 14:49   ` Daniel Vetter
2020-11-24 14:49     ` Daniel Vetter
2020-11-24 22:27     ` Andrey Grodzovsky
2020-11-24 22:27       ` Andrey Grodzovsky
2020-11-25  9:04       ` Daniel Vetter
2020-11-25  9:04         ` Daniel Vetter
2020-11-25 17:39         ` Andrey Grodzovsky
2020-11-25 17:39           ` Andrey Grodzovsky
2020-11-27 13:12           ` Grodzovsky, Andrey
2020-11-27 13:12             ` Grodzovsky, Andrey
2020-11-27 15:04           ` Daniel Vetter
2020-11-27 15:04             ` Daniel Vetter
2020-11-27 15:34             ` Andrey Grodzovsky
2020-11-27 15:34               ` Andrey Grodzovsky
2020-11-21  5:21 ` [PATCH v3 11/12] drm/amdgpu: Register IOMMU topology notifier per device Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky
2020-11-21  5:21 ` [PATCH v3 12/12] drm/amdgpu: Fix a bunch of sdma code crash post device unplug Andrey Grodzovsky
2020-11-21  5:21   ` Andrey Grodzovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.