All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] DG1 Lockdep warning fixes
@ 2021-09-22  8:38 ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22  8:38 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: maarten.lankhorst, matthew.auld

A couple of recent commits introduced lockdep warnings, breaking some
DG1 BAT tests.

Two fixes for those and one HAX patch making CI behave better.

Kai Vehmanen (1):
  HAX: component: do not leave master devres group open after bind

Thomas Hellström (2):
  drm/i915/gem: Fix a lockdep warning the __i915_gem_is_lmem() function
  drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()

 drivers/base/component.c                 | 5 +++--
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 4 ++++
 3 files changed, 8 insertions(+), 3 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 0/3] DG1 Lockdep warning fixes
@ 2021-09-22  8:38 ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22  8:38 UTC (permalink / raw)
  To: intel-gfx, dri-devel; +Cc: maarten.lankhorst, matthew.auld

A couple of recent commits introduced lockdep warnings, breaking some
DG1 BAT tests.

Two fixes for those and one HAX patch making CI behave better.

Kai Vehmanen (1):
  HAX: component: do not leave master devres group open after bind

Thomas Hellström (2):
  drm/i915/gem: Fix a lockdep warning the __i915_gem_is_lmem() function
  drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()

 drivers/base/component.c                 | 5 +++--
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 2 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c  | 4 ++++
 3 files changed, 8 insertions(+), 3 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] drm/i915/gem: Fix a lockdep warning the __i915_gem_is_lmem() function
  2021-09-22  8:38 ` [Intel-gfx] " Thomas Hellström
@ 2021-09-22  8:38   ` Thomas Hellström
  -1 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22  8:38 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Matthew Brost, Thomas Hellström

Somehow we managed to invert the test for i915_gem_object_evictable(),
which causes a warning in DG1 BAT, igt@debugfs_test@read_all_entries.

Fix the lock check to only warn if the object *is* indeed evictable and
not protected from eviction by fences.

Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: 91160c839824 ("drm/i915: Take pinning into account in __i915_gem_object_is_lmem")

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index d659239fcbcc..444f8268b9c5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -67,7 +67,7 @@ bool __i915_gem_object_is_lmem(struct drm_i915_gem_object *obj)
 
 #ifdef CONFIG_LOCKDEP
 	GEM_WARN_ON(dma_resv_test_signaled(obj->base.resv, true) &&
-		    !i915_gem_object_evictable(obj));
+		    i915_gem_object_evictable(obj));
 #endif
 	return mr && (mr->type == INTEL_MEMORY_LOCAL ||
 		      mr->type == INTEL_MEMORY_STOLEN_LOCAL);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 1/3] drm/i915/gem: Fix a lockdep warning the __i915_gem_is_lmem() function
@ 2021-09-22  8:38   ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22  8:38 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Matthew Brost, Thomas Hellström

Somehow we managed to invert the test for i915_gem_object_evictable(),
which causes a warning in DG1 BAT, igt@debugfs_test@read_all_entries.

Fix the lock check to only warn if the object *is* indeed evictable and
not protected from eviction by fences.

Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: 91160c839824 ("drm/i915: Take pinning into account in __i915_gem_object_is_lmem")

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index d659239fcbcc..444f8268b9c5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -67,7 +67,7 @@ bool __i915_gem_object_is_lmem(struct drm_i915_gem_object *obj)
 
 #ifdef CONFIG_LOCKDEP
 	GEM_WARN_ON(dma_resv_test_signaled(obj->base.resv, true) &&
-		    !i915_gem_object_evictable(obj));
+		    i915_gem_object_evictable(obj));
 #endif
 	return mr && (mr->type == INTEL_MEMORY_LOCAL ||
 		      mr->type == INTEL_MEMORY_STOLEN_LOCAL);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/3] drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()
  2021-09-22  8:38 ` [Intel-gfx] " Thomas Hellström
@ 2021-09-22  8:38   ` Thomas Hellström
  -1 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22  8:38 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

In the mman selftest, some tests make the ttm_bo_init_reserved() fail,
which may trigger a call to the i915_ttm_bo_destroy() function.
However, at this point the gem object refcount is set to 1, which
triggers a lockdep warning in __i915_gem_free_object() and a
corresponding failure in DG1 BAT, i915_selftest@live@mman.

Fix this by clearing the gem object refcount if called from that
failure path.

Fixes: f9b23c157a78 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index b94497989995..b1f561543ff3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -900,6 +900,10 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
 
 	i915_ttm_backup_free(obj);
 
+	/* Failure during ttm_bo_init_reserved leaves the refcount set to 1. */
+	if (IS_ENABLED(CONFIG_LOCKDEP) && !obj->ttm.created)
+		refcount_set(&obj->base.refcount.refcount, 0);
+
 	/* This releases all gem object bindings to the backend. */
 	__i915_gem_free_object(obj);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 2/3] drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()
@ 2021-09-22  8:38   ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22  8:38 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Thomas Hellström

In the mman selftest, some tests make the ttm_bo_init_reserved() fail,
which may trigger a call to the i915_ttm_bo_destroy() function.
However, at this point the gem object refcount is set to 1, which
triggers a lockdep warning in __i915_gem_free_object() and a
corresponding failure in DG1 BAT, i915_selftest@live@mman.

Fix this by clearing the gem object refcount if called from that
failure path.

Fixes: f9b23c157a78 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index b94497989995..b1f561543ff3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -900,6 +900,10 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
 
 	i915_ttm_backup_free(obj);
 
+	/* Failure during ttm_bo_init_reserved leaves the refcount set to 1. */
+	if (IS_ENABLED(CONFIG_LOCKDEP) && !obj->ttm.created)
+		refcount_set(&obj->base.refcount.refcount, 0);
+
 	/* This releases all gem object bindings to the backend. */
 	__i915_gem_free_object(obj);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/3] HAX: component: do not leave master devres group open after bind
  2021-09-22  8:38 ` [Intel-gfx] " Thomas Hellström
@ 2021-09-22  8:38   ` Thomas Hellström
  -1 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22  8:38 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Kai Vehmanen, Imre Deak, Russell King

From: Kai Vehmanen <kai.vehmanen@linux.intel.com>

In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.

This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:

 i915_pci_remove()
   -> intel_display_driver_unregister()
     -> i915_audio_component_cleanup()
       -> component_del()
         -> component.c:take_down_master()
           -> hdac_component_master_unbind() [via master->ops->unbind()]
           -> devres_release_group(master->parent, NULL)

With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:

component_master_add_with_match()
[  126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[  126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[  126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)

audio driver completes its PCI probe()
[  126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)

component_del() called() at DRM/i915 unbind()
[  137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[  137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[  137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)

So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.

Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after the master->ops->bind()
call is done. This allows devres allocations done by the driver acting as
master to be isolated from the binding state of the aggregate driver. This
modifies the logic originally introduced in commit 9e1ccb4a7700
("drivers/base: fix devres handling for master device").

BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/base/component.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/base/component.c b/drivers/base/component.c
index 5e79299f6c3f..870485cbbb87 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -246,7 +246,7 @@ static int try_to_bring_up_master(struct master *master,
 		return 0;
 	}
 
-	if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
+	if (!devres_open_group(master->parent, master, GFP_KERNEL))
 		return -ENOMEM;
 
 	/* Found all components */
@@ -258,6 +258,7 @@ static int try_to_bring_up_master(struct master *master,
 		return ret;
 	}
 
+	devres_close_group(master->parent, NULL);
 	master->bound = true;
 	return 1;
 }
@@ -282,7 +283,7 @@ static void take_down_master(struct master *master)
 {
 	if (master->bound) {
 		master->ops->unbind(master->parent);
-		devres_release_group(master->parent, NULL);
+		devres_release_group(master->parent, master);
 		master->bound = false;
 	}
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] [PATCH 3/3] HAX: component: do not leave master devres group open after bind
@ 2021-09-22  8:38   ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22  8:38 UTC (permalink / raw)
  To: intel-gfx, dri-devel
  Cc: maarten.lankhorst, matthew.auld, Kai Vehmanen, Imre Deak, Russell King

From: Kai Vehmanen <kai.vehmanen@linux.intel.com>

In current code, the devres group for aggregate master is left open
after call to component_master_add_*(). This leads to problems when the
master does further managed allocations on its own. When any
participating driver calls component_del(), this leads to immediate
release of resources.

This came up when investigating a page fault occurring with i915 DRM
driver unbind with 5.15-rc1 kernel. The following sequence occurs:

 i915_pci_remove()
   -> intel_display_driver_unregister()
     -> i915_audio_component_cleanup()
       -> component_del()
         -> component.c:take_down_master()
           -> hdac_component_master_unbind() [via master->ops->unbind()]
           -> devres_release_group(master->parent, NULL)

With older kernels this has not caused issues, but with audio driver
moving to use managed interfaces for more of its allocations, this no
longer works. Devres log shows following to occur:

component_master_add_with_match()
[  126.886032] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000323ccdc5 devm_component_match_release (24 bytes)
[  126.886045] snd_hda_intel 0000:00:1f.3: DEVRES ADD 00000000865cdb29 grp< (0 bytes)
[  126.886049] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 grp< (0 bytes)

audio driver completes its PCI probe()
[  126.892238] snd_hda_intel 0000:00:1f.3: DEVRES ADD 000000001b480725 pcim_iomap_release (48 bytes)

component_del() called() at DRM/i915 unbind()
[  137.579422] i915 0000:00:02.0: DEVRES REL 00000000ef44c293 grp< (0 bytes)
[  137.579445] snd_hda_intel 0000:00:1f.3: DEVRES REL 00000000865cdb29 grp< (0 bytes)
[  137.579458] snd_hda_intel 0000:00:1f.3: DEVRES REL 000000001b480725 pcim_iomap_release (48 bytes)

So the "devres_release_group(master->parent, NULL)" ends up freeing the
pcim_iomap allocation. Upon next runtime resume, the audio driver will
cause a page fault as the iomap alloc was released without the driver
knowing about it.

Fix this issue by using the "struct master" pointer as identifier for
the devres group, and by closing the devres group after the master->ops->bind()
call is done. This allows devres allocations done by the driver acting as
master to be isolated from the binding state of the aggregate driver. This
modifies the logic originally introduced in commit 9e1ccb4a7700
("drivers/base: fix devres handling for master device").

BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Acked-by: Imre Deak <imre.deak@intel.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/base/component.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/base/component.c b/drivers/base/component.c
index 5e79299f6c3f..870485cbbb87 100644
--- a/drivers/base/component.c
+++ b/drivers/base/component.c
@@ -246,7 +246,7 @@ static int try_to_bring_up_master(struct master *master,
 		return 0;
 	}
 
-	if (!devres_open_group(master->parent, NULL, GFP_KERNEL))
+	if (!devres_open_group(master->parent, master, GFP_KERNEL))
 		return -ENOMEM;
 
 	/* Found all components */
@@ -258,6 +258,7 @@ static int try_to_bring_up_master(struct master *master,
 		return ret;
 	}
 
+	devres_close_group(master->parent, NULL);
 	master->bound = true;
 	return 1;
 }
@@ -282,7 +283,7 @@ static void take_down_master(struct master *master)
 {
 	if (master->bound) {
 		master->ops->unbind(master->parent);
-		devres_release_group(master->parent, NULL);
+		devres_release_group(master->parent, master);
 		master->bound = false;
 	}
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BUILD: failure for DG1 Lockdep warning fixes
  2021-09-22  8:38 ` [Intel-gfx] " Thomas Hellström
                   ` (3 preceding siblings ...)
  (?)
@ 2021-09-22  9:57 ` Patchwork
  -1 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2021-09-22  9:57 UTC (permalink / raw)
  To: Thomas Hellström; +Cc: intel-gfx

== Series Details ==

Series: DG1 Lockdep warning fixes
URL   : https://patchwork.freedesktop.org/series/94932/
State : failure

== Summary ==

Applying: drm/i915/gem: Fix a lockdep warning the __i915_gem_is_lmem() function
Applying: drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()
Using index info to reconstruct a base tree...
M	drivers/gpu/drm/i915/gem/i915_gem_ttm.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/gpu/drm/i915/gem/i915_gem_ttm.c
CONFLICT (content): Merge conflict in drivers/gpu/drm/i915/gem/i915_gem_ttm.c
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0002 drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 1/3] drm/i915/gem: Fix a lockdep warning the __i915_gem_is_lmem() function
  2021-09-22  8:38   ` [Intel-gfx] " Thomas Hellström
  (?)
@ 2021-09-22 10:10   ` Matthew Auld
  -1 siblings, 0 replies; 14+ messages in thread
From: Matthew Auld @ 2021-09-22 10:10 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Intel Graphics Development, ML dri-devel, Maarten Lankhorst,
	Matthew Auld, Matthew Brost

On Wed, 22 Sept 2021 at 09:38, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> Somehow we managed to invert the test for i915_gem_object_evictable(),
> which causes a warning in DG1 BAT, igt@debugfs_test@read_all_entries.
>
> Fix the lock check to only warn if the object *is* indeed evictable and
> not protected from eviction by fences.
>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Fixes: 91160c839824 ("drm/i915: Take pinning into account in __i915_gem_object_is_lmem")
>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()
  2021-09-22  8:38   ` [Intel-gfx] " Thomas Hellström
@ 2021-09-22 10:55     ` Matthew Auld
  -1 siblings, 0 replies; 14+ messages in thread
From: Matthew Auld @ 2021-09-22 10:55 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Intel Graphics Development, ML dri-devel, Maarten Lankhorst,
	Matthew Auld

On Wed, 22 Sept 2021 at 09:38, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> In the mman selftest, some tests make the ttm_bo_init_reserved() fail,
> which may trigger a call to the i915_ttm_bo_destroy() function.
> However, at this point the gem object refcount is set to 1, which
> triggers a lockdep warning in __i915_gem_free_object() and a
> corresponding failure in DG1 BAT, i915_selftest@live@mman.
>
> Fix this by clearing the gem object refcount if called from that
> failure path.
>
> Fixes: f9b23c157a78 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index b94497989995..b1f561543ff3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -900,6 +900,10 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>
>         i915_ttm_backup_free(obj);
>
> +       /* Failure during ttm_bo_init_reserved leaves the refcount set to 1. */
> +       if (IS_ENABLED(CONFIG_LOCKDEP) && !obj->ttm.created)
> +               refcount_set(&obj->base.refcount.refcount, 0);
> +
>         /* This releases all gem object bindings to the backend. */
>         __i915_gem_free_object(obj);

The __i915_gem_free_object is also nuking stuff like mm.placements,
which is still owned by the caller AFAIK, or at least it is until we
have successfully initialised the object, so smells like potential
double free? Can we easily move that under the ttm.created check?
Otherwise maybe we are meant to move the mm.placements handling into
the RCU callback?

>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 2/3] drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()
@ 2021-09-22 10:55     ` Matthew Auld
  0 siblings, 0 replies; 14+ messages in thread
From: Matthew Auld @ 2021-09-22 10:55 UTC (permalink / raw)
  To: Thomas Hellström
  Cc: Intel Graphics Development, ML dri-devel, Maarten Lankhorst,
	Matthew Auld

On Wed, 22 Sept 2021 at 09:38, Thomas Hellström
<thomas.hellstrom@linux.intel.com> wrote:
>
> In the mman selftest, some tests make the ttm_bo_init_reserved() fail,
> which may trigger a call to the i915_ttm_bo_destroy() function.
> However, at this point the gem object refcount is set to 1, which
> triggers a lockdep warning in __i915_gem_free_object() and a
> corresponding failure in DG1 BAT, i915_selftest@live@mman.
>
> Fix this by clearing the gem object refcount if called from that
> failure path.
>
> Fixes: f9b23c157a78 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> index b94497989995..b1f561543ff3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
> @@ -900,6 +900,10 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>
>         i915_ttm_backup_free(obj);
>
> +       /* Failure during ttm_bo_init_reserved leaves the refcount set to 1. */
> +       if (IS_ENABLED(CONFIG_LOCKDEP) && !obj->ttm.created)
> +               refcount_set(&obj->base.refcount.refcount, 0);
> +
>         /* This releases all gem object bindings to the backend. */
>         __i915_gem_free_object(obj);

The __i915_gem_free_object is also nuking stuff like mm.placements,
which is still owned by the caller AFAIK, or at least it is until we
have successfully initialised the object, so smells like potential
double free? Can we easily move that under the ttm.created check?
Otherwise maybe we are meant to move the mm.placements handling into
the RCU callback?

>
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()
  2021-09-22 10:55     ` [Intel-gfx] " Matthew Auld
@ 2021-09-22 11:34       ` Thomas Hellström
  -1 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22 11:34 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Intel Graphics Development, ML dri-devel, Maarten Lankhorst,
	Matthew Auld


On 9/22/21 12:55 PM, Matthew Auld wrote:
> On Wed, 22 Sept 2021 at 09:38, Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>> In the mman selftest, some tests make the ttm_bo_init_reserved() fail,
>> which may trigger a call to the i915_ttm_bo_destroy() function.
>> However, at this point the gem object refcount is set to 1, which
>> triggers a lockdep warning in __i915_gem_free_object() and a
>> corresponding failure in DG1 BAT, i915_selftest@live@mman.
>>
>> Fix this by clearing the gem object refcount if called from that
>> failure path.
>>
>> Fixes: f9b23c157a78 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index b94497989995..b1f561543ff3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -900,6 +900,10 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>>
>>          i915_ttm_backup_free(obj);
>>
>> +       /* Failure during ttm_bo_init_reserved leaves the refcount set to 1. */
>> +       if (IS_ENABLED(CONFIG_LOCKDEP) && !obj->ttm.created)
>> +               refcount_set(&obj->base.refcount.refcount, 0);
>> +
>>          /* This releases all gem object bindings to the backend. */
>>          __i915_gem_free_object(obj);
> The __i915_gem_free_object is also nuking stuff like mm.placements,
> which is still owned by the caller AFAIK, or at least it is until we
> have successfully initialised the object, so smells like potential
> double free? Can we easily move that under the ttm.created check?
> Otherwise maybe we are meant to move the mm.placements handling into
> the RCU callback?

Yes, it indeed sounds like a closer look is needed for the error 
handling here. Perhaps it makes sense to initialize the TTM part and 
then the GEM part while still having the lock. Meanwhile I'll put it 
under the ttm.created check.

Thanks,

Thomas


>
>> --
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-gfx] [PATCH 2/3] drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object()
@ 2021-09-22 11:34       ` Thomas Hellström
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Hellström @ 2021-09-22 11:34 UTC (permalink / raw)
  To: Matthew Auld
  Cc: Intel Graphics Development, ML dri-devel, Maarten Lankhorst,
	Matthew Auld


On 9/22/21 12:55 PM, Matthew Auld wrote:
> On Wed, 22 Sept 2021 at 09:38, Thomas Hellström
> <thomas.hellstrom@linux.intel.com> wrote:
>> In the mman selftest, some tests make the ttm_bo_init_reserved() fail,
>> which may trigger a call to the i915_ttm_bo_destroy() function.
>> However, at this point the gem object refcount is set to 1, which
>> triggers a lockdep warning in __i915_gem_free_object() and a
>> corresponding failure in DG1 BAT, i915_selftest@live@mman.
>>
>> Fix this by clearing the gem object refcount if called from that
>> failure path.
>>
>> Fixes: f9b23c157a78 ("drm/i915: Move __i915_gem_free_object to ttm_bo_destroy")
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> index b94497989995..b1f561543ff3 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>> @@ -900,6 +900,10 @@ void i915_ttm_bo_destroy(struct ttm_buffer_object *bo)
>>
>>          i915_ttm_backup_free(obj);
>>
>> +       /* Failure during ttm_bo_init_reserved leaves the refcount set to 1. */
>> +       if (IS_ENABLED(CONFIG_LOCKDEP) && !obj->ttm.created)
>> +               refcount_set(&obj->base.refcount.refcount, 0);
>> +
>>          /* This releases all gem object bindings to the backend. */
>>          __i915_gem_free_object(obj);
> The __i915_gem_free_object is also nuking stuff like mm.placements,
> which is still owned by the caller AFAIK, or at least it is until we
> have successfully initialised the object, so smells like potential
> double free? Can we easily move that under the ttm.created check?
> Otherwise maybe we are meant to move the mm.placements handling into
> the RCU callback?

Yes, it indeed sounds like a closer look is needed for the error 
handling here. Perhaps it makes sense to initialize the TTM part and 
then the GEM part while still having the lock. Meanwhile I'll put it 
under the ttm.created check.

Thanks,

Thomas


>
>> --
>> 2.31.1
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-09-22 11:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22  8:38 [PATCH 0/3] DG1 Lockdep warning fixes Thomas Hellström
2021-09-22  8:38 ` [Intel-gfx] " Thomas Hellström
2021-09-22  8:38 ` [PATCH 1/3] drm/i915/gem: Fix a lockdep warning the __i915_gem_is_lmem() function Thomas Hellström
2021-09-22  8:38   ` [Intel-gfx] " Thomas Hellström
2021-09-22 10:10   ` Matthew Auld
2021-09-22  8:38 ` [PATCH 2/3] drm/i915/ttm: Fix lockdep warning in __i915_gem_free_object() Thomas Hellström
2021-09-22  8:38   ` [Intel-gfx] " Thomas Hellström
2021-09-22 10:55   ` Matthew Auld
2021-09-22 10:55     ` [Intel-gfx] " Matthew Auld
2021-09-22 11:34     ` Thomas Hellström
2021-09-22 11:34       ` [Intel-gfx] " Thomas Hellström
2021-09-22  8:38 ` [PATCH 3/3] HAX: component: do not leave master devres group open after bind Thomas Hellström
2021-09-22  8:38   ` [Intel-gfx] " Thomas Hellström
2021-09-22  9:57 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for DG1 Lockdep warning fixes Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.