intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
@ 2023-03-30 15:40 Chang, Bruce
  2023-03-30 15:42 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Chang, Bruce @ 2023-03-30 15:40 UTC (permalink / raw)
  To: intel-xe

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740]  <TASK>
[ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
[ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to share the TTM_PL_TT manager and use a count to only
allow the last instance to release.

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
---
 drivers/gpu/drm/xe/xe_device_types.h |  3 +++
 drivers/gpu/drm/xe/xe_gt.c           |  9 +++++++--
 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c  | 25 +++++++++++++++----------
 3 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 88f863edc41c..ea771f120c0f 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -195,6 +195,9 @@ struct xe_device {
 			/** @mapping: pointer to VRAM mappable space */
 			void *__iomem mapping;
 		} vram;
+		/** @gtt_mr: GTT TTM manager */
+		struct xe_ttm_gtt_mgr *gtt_mgr;
+		int instance;
 	} mem;
 
 	/** @usm: unified memory state */
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index fd7a5b43ba3e..e7ca141478ef 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -77,8 +77,13 @@ int xe_gt_alloc(struct xe_device *xe, struct xe_gt *gt)
 		if (!gt->mem.vram_mgr)
 			return -ENOMEM;
 
-		gt->mem.gtt_mgr = drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
-					       GFP_KERNEL);
+		if (!xe->mem.gtt_mgr) {
+			xe->mem.gtt_mgr =
+				drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
+					     GFP_KERNEL);
+			xe->mem.instance = 0;
+		}
+		gt->mem.gtt_mgr = xe->mem.gtt_mgr;
 		if (!gt->mem.gtt_mgr)
 			return -ENOMEM;
 	} else {
diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
index 8075781070f2..05300c71928e 100644
--- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
@@ -94,6 +94,9 @@ static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
 	struct ttm_resource_manager *man = &mgr->manager;
 	int err;
 
+	if (--xe->mem.instance)
+		return;
+
 	ttm_resource_manager_set_used(man, false);
 
 	err = ttm_resource_manager_evict_all(&xe->ttm, man);
@@ -113,18 +116,20 @@ int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
 
 	XE_BUG_ON(xe_gt_is_media_type(gt));
 
-	mgr->gt = gt;
-	man->use_tt = true;
-	man->func = &xe_ttm_gtt_mgr_func;
-
-	ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
+	if (!xe->mem.instance) {
+		mgr->gt = gt;
+		man->use_tt = true;
+		man->func = &xe_ttm_gtt_mgr_func;
 
-	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
-	ttm_resource_manager_set_used(man, true);
+		ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
 
-	err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
-	if (err)
-		return err;
+		ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
+		ttm_resource_manager_set_used(man, true);
+		err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
+		if (err)
+			return err;
+	}
+	xe->mem.instance ++;
 
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-xe] ✓ CI.Patch_applied: success for drm/xe: fix pvc unload issue
  2023-03-30 15:40 [Intel-xe] [PATCH] drm/xe: fix pvc unload issue Chang, Bruce
@ 2023-03-30 15:42 ` Patchwork
  2023-03-30 15:43 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
  2023-03-31  5:40 ` [Intel-xe] [PATCH] " Matthew Auld
  2 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2023-03-30 15:42 UTC (permalink / raw)
  To: Chang, Bruce; +Cc: intel-xe

== Series Details ==

Series: drm/xe: fix pvc unload issue
URL   : https://patchwork.freedesktop.org/series/115881/
State : success

== Summary ==

=== Applying kernel patches on branch 'drm-xe-next' with base: ===
commit 5de0375b04455c7c8e516f53715eedde50f5ae45
Author:     Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
AuthorDate: Thu Mar 23 15:05:18 2023 +0100
Commit:     Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
CommitDate: Wed Mar 29 11:24:42 2023 +0200

    drm/xe: Display api changes update
    
    Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
=== git am output follows ===
Applying: drm/xe: fix pvc unload issue



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-xe] ✗ CI.KUnit: failure for drm/xe: fix pvc unload issue
  2023-03-30 15:40 [Intel-xe] [PATCH] drm/xe: fix pvc unload issue Chang, Bruce
  2023-03-30 15:42 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
@ 2023-03-30 15:43 ` Patchwork
  2023-03-31  5:40 ` [Intel-xe] [PATCH] " Matthew Auld
  2 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2023-03-30 15:43 UTC (permalink / raw)
  To: Chang, Bruce; +Cc: intel-xe

== Series Details ==

Series: drm/xe: fix pvc unload issue
URL   : https://patchwork.freedesktop.org/series/115881/
State : failure

== Summary ==

+ trap cleanup EXIT
+ /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/xe/.kunitconfig
ERROR:root:In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/drivers/chan_user.c:6:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/os-Linux/helper.c:6:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/os-Linux/irq.c:8:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/os-Linux/skas/process.c:7:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/drivers/fd.c:7:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/os-Linux/execvp.c:24:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/x86/um/os-Linux/registers.c:8:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/os-Linux/main.c:8:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/os-Linux/file.c:8:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
make[3]: *** [../scripts/Makefile.build:252: arch/um/os-Linux/irq.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[4]: *** [../scripts/Makefile.build:252: arch/x86/um/os-Linux/registers.o] Error 1
make[3]: *** [../scripts/Makefile.build:494: arch/x86/um/os-Linux] Error 2
make[3]: *** Waiting for unfinished jobs....
make[3]: *** [../scripts/Makefile.build:252: arch/um/os-Linux/execvp.o] Error 1
make[3]: *** [../scripts/Makefile.build:252: arch/um/drivers/chan_user.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: *** [../scripts/Makefile.build:252: arch/um/drivers/fd.o] Error 1
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/os-Linux/mem.c:8:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
In file included from /usr/include/stdlib.h:1013,
                 from ../arch/um/os-Linux/process.c:8:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
make[3]: *** [../scripts/Makefile.build:252: arch/um/os-Linux/helper.o] Error 1
make[4]: *** [../scripts/Makefile.build:252: arch/um/os-Linux/skas/process.o] Error 1
make[4]: *** Waiting for unfinished jobs....
make[3]: *** [../scripts/Makefile.build:252: arch/um/os-Linux/main.o] Error 1
make[3]: *** [../scripts/Makefile.build:252: arch/um/os-Linux/file.o] Error 1
make[3]: *** [../scripts/Makefile.build:252: arch/um/os-Linux/mem.o] Error 1
make[3]: *** [../scripts/Makefile.build:252: arch/um/os-Linux/process.o] Error 1
make[3]: *** [../scripts/Makefile.build:494: arch/um/os-Linux/skas] Error 2
make[2]: *** [../scripts/Makefile.build:494: arch/um/os-Linux] Error 2
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [../scripts/Makefile.build:494: arch/x86/um] Error 2
make[2]: *** [../scripts/Makefile.build:494: arch/um/drivers] Error 2
In file included from /usr/include/stdlib.h:1013,
                 from arch/um/kernel/config.c:7:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h: In function ‘atof’:
/usr/include/x86_64-linux-gnu/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
   26 | {
      | ^
make[3]: *** [../scripts/Makefile.build:252: arch/um/kernel/config.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [../scripts/Makefile.build:494: arch/um/kernel] Error 2
make[1]: *** [/kernel/Makefile:2025: .] Error 2
make: *** [Makefile:226: __sub-make] Error 2

[15:42:34] Configuring KUnit Kernel ...
Generating .config ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
[15:42:38] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=.kunit olddefconfig
Building with:
$ make ARCH=um O=.kunit --jobs=48
+ cleanup
++ stat -c %u:%g /kernel
+ chown -R 1003:1003 /kernel



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
  2023-03-30 15:40 [Intel-xe] [PATCH] drm/xe: fix pvc unload issue Chang, Bruce
  2023-03-30 15:42 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
  2023-03-30 15:43 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
@ 2023-03-31  5:40 ` Matthew Auld
  2023-03-31 22:55   ` Chang, Yu bruce
  2 siblings, 1 reply; 11+ messages in thread
From: Matthew Auld @ 2023-03-31  5:40 UTC (permalink / raw)
  To: Chang, Bruce; +Cc: intel-xe

On Thu, 30 Mar 2023 at 16:40, Chang, Bruce <yu.bruce.chang@intel.com> wrote:
>
> Currently, unload pvc driver will generate a null dereference
> and the call stack is as below.
>
> [ 4850.618000] Call Trace:
> [ 4850.620740]  <TASK>
> [ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
> [ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
> [ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
> [ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
> [ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
> [ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
> [ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
> [ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
> [ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
> [ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
> [ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]
>
> There are a couple issues, but the main one is due to TTM has only
> one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
> 1 TTM_PL_TT each tile. The second will overwrite the first one.
>
> During unload time, the first tile will reset the TTM_PL_TT manger
> and when the second tile is trying to free Bo and it will generate
> the null reference since the TTM manage is already got reset to 0.
>
> The fix is to share the TTM_PL_TT manager and use a count to only
> allow the last instance to release.

Would it make sense to rather move it out of GT init, and just have
one instance in the xe_device? And with that to rather consider
xe->mem.vram_size, instead of gt->mem.vram_size when considering the
potential "GTT" size? I think PL_TT is just system memory that can be
touched by the GPU device (likely has a gtt mapping), so not sure why
that needs to be GT centric.

>
> Cc: Stuart Summers <stuart.summers@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device_types.h |  3 +++
>  drivers/gpu/drm/xe/xe_gt.c           |  9 +++++++--
>  drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c  | 25 +++++++++++++++----------
>  3 files changed, 25 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 88f863edc41c..ea771f120c0f 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -195,6 +195,9 @@ struct xe_device {
>                         /** @mapping: pointer to VRAM mappable space */
>                         void *__iomem mapping;
>                 } vram;
> +               /** @gtt_mr: GTT TTM manager */
> +               struct xe_ttm_gtt_mgr *gtt_mgr;
> +               int instance;
>         } mem;
>
>         /** @usm: unified memory state */
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index fd7a5b43ba3e..e7ca141478ef 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -77,8 +77,13 @@ int xe_gt_alloc(struct xe_device *xe, struct xe_gt *gt)
>                 if (!gt->mem.vram_mgr)
>                         return -ENOMEM;
>
> -               gt->mem.gtt_mgr = drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
> -                                              GFP_KERNEL);
> +               if (!xe->mem.gtt_mgr) {
> +                       xe->mem.gtt_mgr =
> +                               drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
> +                                            GFP_KERNEL);
> +                       xe->mem.instance = 0;
> +               }
> +               gt->mem.gtt_mgr = xe->mem.gtt_mgr;
>                 if (!gt->mem.gtt_mgr)
>                         return -ENOMEM;
>         } else {
> diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> index 8075781070f2..05300c71928e 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> @@ -94,6 +94,9 @@ static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
>         struct ttm_resource_manager *man = &mgr->manager;
>         int err;
>
> +       if (--xe->mem.instance)
> +               return;
> +
>         ttm_resource_manager_set_used(man, false);
>
>         err = ttm_resource_manager_evict_all(&xe->ttm, man);
> @@ -113,18 +116,20 @@ int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
>
>         XE_BUG_ON(xe_gt_is_media_type(gt));
>
> -       mgr->gt = gt;
> -       man->use_tt = true;
> -       man->func = &xe_ttm_gtt_mgr_func;
> -
> -       ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
> +       if (!xe->mem.instance) {
> +               mgr->gt = gt;
> +               man->use_tt = true;
> +               man->func = &xe_ttm_gtt_mgr_func;
>
> -       ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> -       ttm_resource_manager_set_used(man, true);
> +               ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
>
> -       err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
> -       if (err)
> -               return err;
> +               ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> +               ttm_resource_manager_set_used(man, true);
> +               err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
> +               if (err)
> +                       return err;
> +       }
> +       xe->mem.instance ++;
>
>         return 0;
>  }
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
  2023-03-31  5:40 ` [Intel-xe] [PATCH] " Matthew Auld
@ 2023-03-31 22:55   ` Chang, Yu bruce
  0 siblings, 0 replies; 11+ messages in thread
From: Chang, Yu bruce @ 2023-03-31 22:55 UTC (permalink / raw)
  To: Matthew Auld; +Cc: intel-xe



> -----Original Message-----
> From: Matthew Auld <matthew.william.auld@gmail.com>
> Sent: Thursday, March 30, 2023 10:40 PM
> To: Chang, Yu bruce <yu.bruce.chang@intel.com>
> Cc: intel-xe@lists.freedesktop.org; Thomas Hellström
> <thomas.hellstrom@linux.intel.com>
> Subject: Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
> 
> On Thu, 30 Mar 2023 at 16:40, Chang, Bruce <yu.bruce.chang@intel.com>
> wrote:
> >
> > Currently, unload pvc driver will generate a null dereference and the
> > call stack is as below.
> >
> > [ 4850.618000] Call Trace:
> > [ 4850.620740]  <TASK>
> > [ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm] [
> > 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm] [ 4850.633317]  ?
> > drm_buddy_fini+0x62/0x80 [drm_buddy] [ 4850.638487]  ?
> > __kmem_cache_free+0x27d/0x2c0 [ 4850.643054]  ttm_bo_put+0x38/0x60
> > [ttm] [ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe] [ 4850.651945]
> > drm_gem_object_free+0x1e/0x30 [drm] [ 4850.656904]
> > ggtt_fini_noalloc+0x9d/0xe0 [xe] [ 4850.661574]
> > drm_managed_release+0xb5/0x150 [drm] [ 4850.666617]
> > drm_dev_release+0x30/0x50 [drm] [ 4850.671209]
> > devm_drm_dev_init_release+0x3c/0x60 [drm]
> >
> > There are a couple issues, but the main one is due to TTM has only one
> > TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
> > 1 TTM_PL_TT each tile. The second will overwrite the first one.
> >
> > During unload time, the first tile will reset the TTM_PL_TT manger and
> > when the second tile is trying to free Bo and it will generate the
> > null reference since the TTM manage is already got reset to 0.
> >
> > The fix is to share the TTM_PL_TT manager and use a count to only
> > allow the last instance to release.
> 
> Would it make sense to rather move it out of GT init, and just have one
> instance in the xe_device? And with that to rather consider
> xe->mem.vram_size, instead of gt->mem.vram_size when considering the
> potential "GTT" size? I think PL_TT is just system memory that can be
> touched by the GPU device (likely has a gtt mapping), so not sure why that
> needs to be GT centric.
> 

[BC] Also got comment from Matt as below
"
Matthew Auld is correct, that the gtt_mgr is the system memory manager for
TTM and this is a per device resource - that is where it should live, be initialized,
and fini. Likely change any args in public functions to a device rather than GT too. 
...
let’s adjust the name too. s/gtt_mgr/sys_mgr

Matt
"

[BC] actually I thought about it, but I didn't want to make too much change
and just want to fix the unloading issue. 

Totally agree that the gtt should be global and not per gt. Will try to make
the change.

Thanks,
Bruce

> >
> > Cc: Stuart Summers <stuart.summers@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> > ---
> >  drivers/gpu/drm/xe/xe_device_types.h |  3 +++
> >  drivers/gpu/drm/xe/xe_gt.c           |  9 +++++++--
> >  drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c  | 25 +++++++++++++++----------
> >  3 files changed, 25 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h
> > b/drivers/gpu/drm/xe/xe_device_types.h
> > index 88f863edc41c..ea771f120c0f 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -195,6 +195,9 @@ struct xe_device {
> >                         /** @mapping: pointer to VRAM mappable space */
> >                         void *__iomem mapping;
> >                 } vram;
> > +               /** @gtt_mr: GTT TTM manager */
> > +               struct xe_ttm_gtt_mgr *gtt_mgr;
> > +               int instance;
> >         } mem;
> >
> >         /** @usm: unified memory state */ diff --git
> > a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c index
> > fd7a5b43ba3e..e7ca141478ef 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -77,8 +77,13 @@ int xe_gt_alloc(struct xe_device *xe, struct xe_gt
> *gt)
> >                 if (!gt->mem.vram_mgr)
> >                         return -ENOMEM;
> >
> > -               gt->mem.gtt_mgr = drmm_kzalloc(drm, sizeof(*gt-
> >mem.gtt_mgr),
> > -                                              GFP_KERNEL);
> > +               if (!xe->mem.gtt_mgr) {
> > +                       xe->mem.gtt_mgr =
> > +                               drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
> > +                                            GFP_KERNEL);
> > +                       xe->mem.instance = 0;
> > +               }
> > +               gt->mem.gtt_mgr = xe->mem.gtt_mgr;
> >                 if (!gt->mem.gtt_mgr)
> >                         return -ENOMEM;
> >         } else {
> > diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> > b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> > index 8075781070f2..05300c71928e 100644
> > --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> > +++ b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> > @@ -94,6 +94,9 @@ static void ttm_gtt_mgr_fini(struct drm_device *drm,
> void *arg)
> >         struct ttm_resource_manager *man = &mgr->manager;
> >         int err;
> >
> > +       if (--xe->mem.instance)
> > +               return;
> > +
> >         ttm_resource_manager_set_used(man, false);
> >
> >         err = ttm_resource_manager_evict_all(&xe->ttm, man); @@
> > -113,18 +116,20 @@ int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct
> > xe_ttm_gtt_mgr *mgr,
> >
> >         XE_BUG_ON(xe_gt_is_media_type(gt));
> >
> > -       mgr->gt = gt;
> > -       man->use_tt = true;
> > -       man->func = &xe_ttm_gtt_mgr_func;
> > -
> > -       ttm_resource_manager_init(man, &xe->ttm, gtt_size >>
> PAGE_SHIFT);
> > +       if (!xe->mem.instance) {
> > +               mgr->gt = gt;
> > +               man->use_tt = true;
> > +               man->func = &xe_ttm_gtt_mgr_func;
> >
> > -       ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> > -       ttm_resource_manager_set_used(man, true);
> > +               ttm_resource_manager_init(man, &xe->ttm, gtt_size >>
> > + PAGE_SHIFT);
> >
> > -       err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
> > -       if (err)
> > -               return err;
> > +               ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> > +               ttm_resource_manager_set_used(man, true);
> > +               err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini,
> mgr);
> > +               if (err)
> > +                       return err;
> > +       }
> > +       xe->mem.instance ++;
> >
> >         return 0;
> >  }
> > --
> > 2.25.1
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
  2023-04-04 19:45   ` Rodrigo Vivi
@ 2023-04-04 20:22     ` Lucas De Marchi
  0 siblings, 0 replies; 11+ messages in thread
From: Lucas De Marchi @ 2023-04-04 20:22 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: Stuart Summers, Chang, Bruce, intel-xe

On Tue, Apr 04, 2023 at 03:45:05PM -0400, Rodrigo Vivi wrote:
>On Tue, Apr 04, 2023 at 11:48:39AM -0700, Lucas De Marchi wrote:
>> On Mon, Apr 03, 2023 at 10:20:31PM +0000, Chang, Bruce wrote:
>> > Currently, unload pvc driver will generate a null dereference
>> > and the call stack is as below.
>> >
>> > [ 4850.618000] Call Trace:
>> > [ 4850.620740]  <TASK>
>> > [ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
>> > [ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
>> > [ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
>> > [ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
>> > [ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
>> > [ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
>> > [ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
>> > [ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
>> > [ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
>> > [ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
>> > [ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]
>> >
>> > There are a couple issues, but the main one is due to TTM has only
>> > one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
>> > 1 TTM_PL_TT each tile. The second will overwrite the first one.
>> >
>> > During unload time, the first tile will reset the TTM_PL_TT manger
>> > and when the second tile is trying to free Bo and it will generate
>> > the null reference since the TTM manage is already got reset to 0.
>> >
>> > The fix is to use one global TTM_PL_TT manager.
>> >
>> > v2: make gtt mgr global and change the name to sys_mgr
>> >
>> > Cc: Stuart Summers <stuart.summers@intel.com>
>> > Cc: Matthew Brost <matthew.brost@intel.com>
>> > Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
>> > Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
>> > ---
>> > drivers/gpu/drm/xe/Makefile                   |  2 +-
>> > drivers/gpu/drm/xe/xe_device.c                |  2 +
>> > drivers/gpu/drm/xe/xe_device.h                |  1 +
>> > drivers/gpu/drm/xe/xe_device_types.h          |  2 +
>> > drivers/gpu/drm/xe/xe_gt.c                    | 18 -----
>> > drivers/gpu/drm/xe/xe_gt_types.h              |  2 -
>> > drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h           | 16 -----
>> > drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h     | 18 -----
>> > .../xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} | 67 +++++++------------
>> > 9 files changed, 31 insertions(+), 97 deletions(-)
>> > delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
>> > delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
>> > rename drivers/gpu/drm/xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} (52%)
>> >
>> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> > index 669c7b6f552c..7e19a15af3dc 100644
>> > --- a/drivers/gpu/drm/xe/Makefile
>> > +++ b/drivers/gpu/drm/xe/Makefile
>> > @@ -83,7 +83,7 @@ xe-y += xe_bb.o \
>> > 	xe_step.o \
>> > 	xe_sync.o \
>> > 	xe_trace.o \
>> > -	xe_ttm_gtt_mgr.o \
>> > +	xe_ttm_sys_mgr.o \
>> > 	xe_ttm_stolen_mgr.o \
>> > 	xe_ttm_vram_mgr.o \
>> > 	xe_tuning.o \
>> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> > index a79f934e3d2d..4fb9aff27686 100644
>> > --- a/drivers/gpu/drm/xe/xe_device.c
>> > +++ b/drivers/gpu/drm/xe/xe_device.c
>> > @@ -279,6 +279,8 @@ int xe_device_probe(struct xe_device *xe)
>> > 	if (err)
>> > 		goto err_irq_shutdown;
>> >
>> > +	xe_ttm_sys_mgr_init(xe);
>> > +
>> > 	for_each_gt(gt, xe, id) {
>> > 		err = xe_gt_init_noalloc(gt);
>> > 		if (err)
>> > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>> > index d277f8985f7b..d9d1b09a8e38 100644
>> > --- a/drivers/gpu/drm/xe/xe_device.h
>> > +++ b/drivers/gpu/drm/xe/xe_device.h
>> > @@ -116,4 +116,5 @@ static inline bool xe_device_has_flat_ccs(struct xe_device *xe)
>> > }
>> >
>> > u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
>> > +int xe_ttm_sys_mgr_init(struct xe_device *xe);
>>
>>   CC [M]  drivers/gpu/drm/xe/tests/xe_bo_test.o
>> ../drivers/gpu/drm/xe/xe_ttm_sys_mgr.c:99:5: error: no previous prototype for ‘xe_ttm_sys_mgr_init’ [-Werror=missing-prototypes]
>>    99 | int xe_ttm_sys_mgr_init(struct xe_device *xe)
>>       |     ^~~~~~~~~~~~~~~~~~~
>>
>
>that's awkward... I didn't get issue here, not CI got it...

I enable W=1 every time I'm going to push so I get this kind of error.

We used to have it in the gitlab CI, but lost when migrated to the
mailing list.  I talked with Ryszard to setup CI to run
"dev-provided-checks" so we can enable this back and probably run some
more. Hopefully we can add it back soon.

FWIW these are the commands I use:

	alias m='make -j$(nproc)'
	alias m64='m O=build64'
	m64
	./scripts/config --file build64/.config --disable CONFIG_DRM_XE_DISPLAY && \
		m64 modules_prepare && m64 M=drivers/gpu/drm/xe W=1
	./scripts/config --file build64/.config --enable  CONFIG_DRM_XE_DISPLAY

which does the trick of "build everything, rebuild only xe.ko with
display disabled and W=1 (otherwise it fails), then move
CONFIG_DRM_XE_DISPLAY back on.

Lucas De Marchi

>
>>
>> couple of issues:
>>
>> 1) you moved the implementation to a new file and didn't include
>> xe_device.h, creating the build issue above
>
>I was wondering about gcc version, but this wouldn't be magic in older gcc...
>
>> 2) The decl shouldn't be here though. You need a xe_ttm_sys_mgr.h, just
>> like we have xe_ttm_stolen_mgr.h, xe_ttm_vram_mgr.h, xe_ttm_vram_mgr_types.h
>
>yeap, that sounds the way to go...
>
>>
>> Lucas De Marchi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
  2023-04-04 18:48 ` Lucas De Marchi
@ 2023-04-04 19:45   ` Rodrigo Vivi
  2023-04-04 20:22     ` Lucas De Marchi
  0 siblings, 1 reply; 11+ messages in thread
From: Rodrigo Vivi @ 2023-04-04 19:45 UTC (permalink / raw)
  To: Lucas De Marchi; +Cc: Stuart Summers, Vivi, Chang, Bruce, intel-xe

On Tue, Apr 04, 2023 at 11:48:39AM -0700, Lucas De Marchi wrote:
> On Mon, Apr 03, 2023 at 10:20:31PM +0000, Chang, Bruce wrote:
> > Currently, unload pvc driver will generate a null dereference
> > and the call stack is as below.
> > 
> > [ 4850.618000] Call Trace:
> > [ 4850.620740]  <TASK>
> > [ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
> > [ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
> > [ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
> > [ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
> > [ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
> > [ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
> > [ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
> > [ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
> > [ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
> > [ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
> > [ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]
> > 
> > There are a couple issues, but the main one is due to TTM has only
> > one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
> > 1 TTM_PL_TT each tile. The second will overwrite the first one.
> > 
> > During unload time, the first tile will reset the TTM_PL_TT manger
> > and when the second tile is trying to free Bo and it will generate
> > the null reference since the TTM manage is already got reset to 0.
> > 
> > The fix is to use one global TTM_PL_TT manager.
> > 
> > v2: make gtt mgr global and change the name to sys_mgr
> > 
> > Cc: Stuart Summers <stuart.summers@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> > Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> > ---
> > drivers/gpu/drm/xe/Makefile                   |  2 +-
> > drivers/gpu/drm/xe/xe_device.c                |  2 +
> > drivers/gpu/drm/xe/xe_device.h                |  1 +
> > drivers/gpu/drm/xe/xe_device_types.h          |  2 +
> > drivers/gpu/drm/xe/xe_gt.c                    | 18 -----
> > drivers/gpu/drm/xe/xe_gt_types.h              |  2 -
> > drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h           | 16 -----
> > drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h     | 18 -----
> > .../xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} | 67 +++++++------------
> > 9 files changed, 31 insertions(+), 97 deletions(-)
> > delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
> > delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
> > rename drivers/gpu/drm/xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} (52%)
> > 
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 669c7b6f552c..7e19a15af3dc 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -83,7 +83,7 @@ xe-y += xe_bb.o \
> > 	xe_step.o \
> > 	xe_sync.o \
> > 	xe_trace.o \
> > -	xe_ttm_gtt_mgr.o \
> > +	xe_ttm_sys_mgr.o \
> > 	xe_ttm_stolen_mgr.o \
> > 	xe_ttm_vram_mgr.o \
> > 	xe_tuning.o \
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index a79f934e3d2d..4fb9aff27686 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -279,6 +279,8 @@ int xe_device_probe(struct xe_device *xe)
> > 	if (err)
> > 		goto err_irq_shutdown;
> > 
> > +	xe_ttm_sys_mgr_init(xe);
> > +
> > 	for_each_gt(gt, xe, id) {
> > 		err = xe_gt_init_noalloc(gt);
> > 		if (err)
> > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > index d277f8985f7b..d9d1b09a8e38 100644
> > --- a/drivers/gpu/drm/xe/xe_device.h
> > +++ b/drivers/gpu/drm/xe/xe_device.h
> > @@ -116,4 +116,5 @@ static inline bool xe_device_has_flat_ccs(struct xe_device *xe)
> > }
> > 
> > u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
> > +int xe_ttm_sys_mgr_init(struct xe_device *xe);
> 
>   CC [M]  drivers/gpu/drm/xe/tests/xe_bo_test.o
> ../drivers/gpu/drm/xe/xe_ttm_sys_mgr.c:99:5: error: no previous prototype for ‘xe_ttm_sys_mgr_init’ [-Werror=missing-prototypes]
>    99 | int xe_ttm_sys_mgr_init(struct xe_device *xe)
>       |     ^~~~~~~~~~~~~~~~~~~
> 

that's awkward... I didn't get issue here, not CI got it...

> 
> couple of issues:
> 
> 1) you moved the implementation to a new file and didn't include
> xe_device.h, creating the build issue above

I was wondering about gcc version, but this wouldn't be magic in older gcc...

> 2) The decl shouldn't be here though. You need a xe_ttm_sys_mgr.h, just
> like we have xe_ttm_stolen_mgr.h, xe_ttm_vram_mgr.h, xe_ttm_vram_mgr_types.h

yeap, that sounds the way to go...

> 
> Lucas De Marchi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
  2023-04-03 22:20 Chang, Bruce
  2023-04-04  3:44 ` Matthew Brost
@ 2023-04-04 18:48 ` Lucas De Marchi
  2023-04-04 19:45   ` Rodrigo Vivi
  1 sibling, 1 reply; 11+ messages in thread
From: Lucas De Marchi @ 2023-04-04 18:48 UTC (permalink / raw)
  To: Chang, Bruce; +Cc: Stuart Summers, Vivi, intel-xe, Rodrigo

On Mon, Apr 03, 2023 at 10:20:31PM +0000, Chang, Bruce wrote:
>Currently, unload pvc driver will generate a null dereference
>and the call stack is as below.
>
>[ 4850.618000] Call Trace:
>[ 4850.620740]  <TASK>
>[ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
>[ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
>[ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
>[ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
>[ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
>[ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
>[ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
>[ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
>[ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
>[ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
>[ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]
>
>There are a couple issues, but the main one is due to TTM has only
>one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
>1 TTM_PL_TT each tile. The second will overwrite the first one.
>
>During unload time, the first tile will reset the TTM_PL_TT manger
>and when the second tile is trying to free Bo and it will generate
>the null reference since the TTM manage is already got reset to 0.
>
>The fix is to use one global TTM_PL_TT manager.
>
>v2: make gtt mgr global and change the name to sys_mgr
>
>Cc: Stuart Summers <stuart.summers@intel.com>
>Cc: Matthew Brost <matthew.brost@intel.com>
>Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
>Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
>---
> drivers/gpu/drm/xe/Makefile                   |  2 +-
> drivers/gpu/drm/xe/xe_device.c                |  2 +
> drivers/gpu/drm/xe/xe_device.h                |  1 +
> drivers/gpu/drm/xe/xe_device_types.h          |  2 +
> drivers/gpu/drm/xe/xe_gt.c                    | 18 -----
> drivers/gpu/drm/xe/xe_gt_types.h              |  2 -
> drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h           | 16 -----
> drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h     | 18 -----
> .../xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} | 67 +++++++------------
> 9 files changed, 31 insertions(+), 97 deletions(-)
> delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
> delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
> rename drivers/gpu/drm/xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} (52%)
>
>diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>index 669c7b6f552c..7e19a15af3dc 100644
>--- a/drivers/gpu/drm/xe/Makefile
>+++ b/drivers/gpu/drm/xe/Makefile
>@@ -83,7 +83,7 @@ xe-y += xe_bb.o \
> 	xe_step.o \
> 	xe_sync.o \
> 	xe_trace.o \
>-	xe_ttm_gtt_mgr.o \
>+	xe_ttm_sys_mgr.o \
> 	xe_ttm_stolen_mgr.o \
> 	xe_ttm_vram_mgr.o \
> 	xe_tuning.o \
>diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>index a79f934e3d2d..4fb9aff27686 100644
>--- a/drivers/gpu/drm/xe/xe_device.c
>+++ b/drivers/gpu/drm/xe/xe_device.c
>@@ -279,6 +279,8 @@ int xe_device_probe(struct xe_device *xe)
> 	if (err)
> 		goto err_irq_shutdown;
>
>+	xe_ttm_sys_mgr_init(xe);
>+
> 	for_each_gt(gt, xe, id) {
> 		err = xe_gt_init_noalloc(gt);
> 		if (err)
>diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>index d277f8985f7b..d9d1b09a8e38 100644
>--- a/drivers/gpu/drm/xe/xe_device.h
>+++ b/drivers/gpu/drm/xe/xe_device.h
>@@ -116,4 +116,5 @@ static inline bool xe_device_has_flat_ccs(struct xe_device *xe)
> }
>
> u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
>+int xe_ttm_sys_mgr_init(struct xe_device *xe);

   CC [M]  drivers/gpu/drm/xe/tests/xe_bo_test.o
../drivers/gpu/drm/xe/xe_ttm_sys_mgr.c:99:5: error: no previous prototype for ‘xe_ttm_sys_mgr_init’ [-Werror=missing-prototypes]
    99 | int xe_ttm_sys_mgr_init(struct xe_device *xe)
       |     ^~~~~~~~~~~~~~~~~~~


couple of issues:

1) you moved the implementation to a new file and didn't include
xe_device.h, creating the build issue above
2) The decl shouldn't be here though. You need a xe_ttm_sys_mgr.h, just
like we have xe_ttm_stolen_mgr.h, xe_ttm_vram_mgr.h, xe_ttm_vram_mgr_types.h

Lucas De Marchi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
  2023-04-04  3:44 ` Matthew Brost
@ 2023-04-04 16:29   ` Rodrigo Vivi
  0 siblings, 0 replies; 11+ messages in thread
From: Rodrigo Vivi @ 2023-04-04 16:29 UTC (permalink / raw)
  To: Matthew Brost; +Cc: Stuart Summers, Vivi, Chang, Bruce, intel-xe

On Tue, Apr 04, 2023 at 03:44:18AM +0000, Matthew Brost wrote:
> On Mon, Apr 03, 2023 at 10:20:31PM +0000, Chang, Bruce wrote:
> > Currently, unload pvc driver will generate a null dereference
> > and the call stack is as below.
> > 
> > [ 4850.618000] Call Trace:
> > [ 4850.620740]  <TASK>
> > [ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
> > [ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
> > [ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
> > [ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
> > [ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
> > [ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
> > [ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
> > [ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
> > [ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
> > [ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
> > [ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]
> > 
> > There are a couple issues, but the main one is due to TTM has only
> > one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
> > 1 TTM_PL_TT each tile. The second will overwrite the first one.
> > 
> > During unload time, the first tile will reset the TTM_PL_TT manger
> > and when the second tile is trying to free Bo and it will generate
> > the null reference since the TTM manage is already got reset to 0.
> > 
> > The fix is to use one global TTM_PL_TT manager.
> > 
> > v2: make gtt mgr global and change the name to sys_mgr
> > 
> > Cc: Stuart Summers <stuart.summers@intel.com>
> > Cc: Matthew Brost <matthew.brost@intel.com>
> > Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> > Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> 
> Reviewed-by: Matthew Brost <matthew.brost@intel.com>

pushed to drm-xe-next. Thanks for the patch and review.

> 
> > ---
> >  drivers/gpu/drm/xe/Makefile                   |  2 +-
> >  drivers/gpu/drm/xe/xe_device.c                |  2 +
> >  drivers/gpu/drm/xe/xe_device.h                |  1 +
> >  drivers/gpu/drm/xe/xe_device_types.h          |  2 +
> >  drivers/gpu/drm/xe/xe_gt.c                    | 18 -----
> >  drivers/gpu/drm/xe/xe_gt_types.h              |  2 -
> >  drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h           | 16 -----
> >  drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h     | 18 -----
> >  .../xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} | 67 +++++++------------
> >  9 files changed, 31 insertions(+), 97 deletions(-)
> >  delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
> >  delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
> >  rename drivers/gpu/drm/xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} (52%)
> > 
> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> > index 669c7b6f552c..7e19a15af3dc 100644
> > --- a/drivers/gpu/drm/xe/Makefile
> > +++ b/drivers/gpu/drm/xe/Makefile
> > @@ -83,7 +83,7 @@ xe-y += xe_bb.o \
> >  	xe_step.o \
> >  	xe_sync.o \
> >  	xe_trace.o \
> > -	xe_ttm_gtt_mgr.o \
> > +	xe_ttm_sys_mgr.o \
> >  	xe_ttm_stolen_mgr.o \
> >  	xe_ttm_vram_mgr.o \
> >  	xe_tuning.o \
> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> > index a79f934e3d2d..4fb9aff27686 100644
> > --- a/drivers/gpu/drm/xe/xe_device.c
> > +++ b/drivers/gpu/drm/xe/xe_device.c
> > @@ -279,6 +279,8 @@ int xe_device_probe(struct xe_device *xe)
> >  	if (err)
> >  		goto err_irq_shutdown;
> >  
> > +	xe_ttm_sys_mgr_init(xe);
> > +
> >  	for_each_gt(gt, xe, id) {
> >  		err = xe_gt_init_noalloc(gt);
> >  		if (err)
> > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> > index d277f8985f7b..d9d1b09a8e38 100644
> > --- a/drivers/gpu/drm/xe/xe_device.h
> > +++ b/drivers/gpu/drm/xe/xe_device.h
> > @@ -116,4 +116,5 @@ static inline bool xe_device_has_flat_ccs(struct xe_device *xe)
> >  }
> >  
> >  u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
> > +int xe_ttm_sys_mgr_init(struct xe_device *xe);
> >  #endif
> > diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> > index 88f863edc41c..6fb53a54913f 100644
> > --- a/drivers/gpu/drm/xe/xe_device_types.h
> > +++ b/drivers/gpu/drm/xe/xe_device_types.h
> > @@ -195,6 +195,8 @@ struct xe_device {
> >  			/** @mapping: pointer to VRAM mappable space */
> >  			void *__iomem mapping;
> >  		} vram;
> > +		/** @sys_mgr: system TTM manager */
> > +		struct ttm_resource_manager sys_mgr;
> >  	} mem;
> >  
> >  	/** @usm: unified memory state */
> > diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> > index fd7a5b43ba3e..a6ea138a7877 100644
> > --- a/drivers/gpu/drm/xe/xe_gt.c
> > +++ b/drivers/gpu/drm/xe/xe_gt.c
> > @@ -36,7 +36,6 @@
> >  #include "xe_ring_ops.h"
> >  #include "xe_sa.h"
> >  #include "xe_sched_job.h"
> > -#include "xe_ttm_gtt_mgr.h"
> >  #include "xe_ttm_vram_mgr.h"
> >  #include "xe_tuning.h"
> >  #include "xe_uc.h"
> > @@ -77,16 +76,11 @@ int xe_gt_alloc(struct xe_device *xe, struct xe_gt *gt)
> >  		if (!gt->mem.vram_mgr)
> >  			return -ENOMEM;
> >  
> > -		gt->mem.gtt_mgr = drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
> > -					       GFP_KERNEL);
> > -		if (!gt->mem.gtt_mgr)
> > -			return -ENOMEM;
> >  	} else {
> >  		struct xe_gt *full_gt = xe_find_full_gt(gt);
> >  
> >  		gt->mem.ggtt = full_gt->mem.ggtt;
> >  		gt->mem.vram_mgr = full_gt->mem.vram_mgr;
> > -		gt->mem.gtt_mgr = full_gt->mem.gtt_mgr;
> >  	}
> >  
> >  	gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0);
> > @@ -98,26 +92,14 @@ static int gt_ttm_mgr_init(struct xe_gt *gt)
> >  {
> >  	struct xe_device *xe = gt_to_xe(gt);
> >  	int err;
> > -	struct sysinfo si;
> > -	u64 gtt_size;
> > -
> > -	si_meminfo(&si);
> > -	gtt_size = (u64)si.totalram * si.mem_unit * 3/4;
> >  
> >  	if (gt->mem.vram.size) {
> >  		err = xe_ttm_vram_mgr_init(gt, gt->mem.vram_mgr);
> >  		if (err)
> >  			return err;
> > -		gtt_size = min(max((XE_DEFAULT_GTT_SIZE_MB << 20),
> > -				   (u64)gt->mem.vram.size),
> > -			       gtt_size);
> >  		xe->info.mem_region_mask |= BIT(gt->info.vram_id) << 1;
> >  	}
> >  
> > -	err = xe_ttm_gtt_mgr_init(gt, gt->mem.gtt_mgr, gtt_size);
> > -	if (err)
> > -		return err;
> > -
> >  	return 0;
> >  }
> >  
> > diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
> > index 8f29aba455e0..9d3117fad2e4 100644
> > --- a/drivers/gpu/drm/xe/xe_gt_types.h
> > +++ b/drivers/gpu/drm/xe/xe_gt_types.h
> > @@ -162,8 +162,6 @@ struct xe_gt {
> >  		} vram;
> >  		/** @vram_mgr: VRAM TTM manager */
> >  		struct xe_ttm_vram_mgr *vram_mgr;
> > -		/** @gtt_mr: GTT TTM manager */
> > -		struct xe_ttm_gtt_mgr *gtt_mgr;
> >  		/** @ggtt: Global graphics translation table */
> >  		struct xe_ggtt *ggtt;
> >  	} mem;
> > diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
> > deleted file mode 100644
> > index d1d57cb9c2b8..000000000000
> > --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
> > +++ /dev/null
> > @@ -1,16 +0,0 @@
> > -/* SPDX-License-Identifier: MIT */
> > -/*
> > - * Copyright © 2022 Intel Corporation
> > - */
> > -
> > -#ifndef _XE_TTGM_GTT_MGR_H_
> > -#define _XE_TTGM_GTT_MGR_H_
> > -
> > -#include "xe_ttm_gtt_mgr_types.h"
> > -
> > -struct xe_gt;
> > -
> > -int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
> > -			u64 gtt_size);
> > -
> > -#endif
> > diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
> > deleted file mode 100644
> > index c66737488326..000000000000
> > --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
> > +++ /dev/null
> > @@ -1,18 +0,0 @@
> > -/* SPDX-License-Identifier: MIT */
> > -/*
> > - * Copyright © 2022 Intel Corporation
> > - */
> > -
> > -#ifndef _XE_TTM_GTT_MGR_TYPES_H_
> > -#define _XE_TTM_GTT_MGR_TYPES_H_
> > -
> > -#include <drm/ttm/ttm_device.h>
> > -
> > -struct xe_gt;
> > -
> > -struct xe_ttm_gtt_mgr {
> > -	struct xe_gt *gt;
> > -	struct ttm_resource_manager manager;
> > -};
> > -
> > -#endif
> > diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > similarity index 52%
> > rename from drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> > rename to drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > index 8075781070f2..cf5f4f73d4dc 100644
> > --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> > +++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> > @@ -12,31 +12,24 @@
> >  
> >  #include "xe_bo.h"
> >  #include "xe_gt.h"
> > -#include "xe_ttm_gtt_mgr.h"
> >  
> > -struct xe_ttm_gtt_node {
> > +struct xe_ttm_sys_node {
> >  	struct ttm_buffer_object *tbo;
> >  	struct ttm_range_mgr_node base;
> >  };
> >  
> > -static inline struct xe_ttm_gtt_mgr *
> > -to_gtt_mgr(struct ttm_resource_manager *man)
> > +static inline struct xe_ttm_sys_node *
> > +to_xe_ttm_sys_node(struct ttm_resource *res)
> >  {
> > -	return container_of(man, struct xe_ttm_gtt_mgr, manager);
> > +	return container_of(res, struct xe_ttm_sys_node, base.base);
> >  }
> >  
> > -static inline struct xe_ttm_gtt_node *
> > -to_xe_ttm_gtt_node(struct ttm_resource *res)
> > -{
> > -	return container_of(res, struct xe_ttm_gtt_node, base.base);
> > -}
> > -
> > -static int xe_ttm_gtt_mgr_new(struct ttm_resource_manager *man,
> > +static int xe_ttm_sys_mgr_new(struct ttm_resource_manager *man,
> >  			      struct ttm_buffer_object *tbo,
> >  			      const struct ttm_place *place,
> >  			      struct ttm_resource **res)
> >  {
> > -	struct xe_ttm_gtt_node *node;
> > +	struct xe_ttm_sys_node *node;
> >  	int r;
> >  
> >  	node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL);
> > @@ -66,32 +59,31 @@ static int xe_ttm_gtt_mgr_new(struct ttm_resource_manager *man,
> >  	return r;
> >  }
> >  
> > -static void xe_ttm_gtt_mgr_del(struct ttm_resource_manager *man,
> > +static void xe_ttm_sys_mgr_del(struct ttm_resource_manager *man,
> >  			       struct ttm_resource *res)
> >  {
> > -	struct xe_ttm_gtt_node *node = to_xe_ttm_gtt_node(res);
> > +	struct xe_ttm_sys_node *node = to_xe_ttm_sys_node(res);
> >  
> >  	ttm_resource_fini(man, res);
> >  	kfree(node);
> >  }
> >  
> > -static void xe_ttm_gtt_mgr_debug(struct ttm_resource_manager *man,
> > +static void xe_ttm_sys_mgr_debug(struct ttm_resource_manager *man,
> >  				 struct drm_printer *printer)
> >  {
> >  
> >  }
> >  
> > -static const struct ttm_resource_manager_func xe_ttm_gtt_mgr_func = {
> > -	.alloc = xe_ttm_gtt_mgr_new,
> > -	.free = xe_ttm_gtt_mgr_del,
> > -	.debug = xe_ttm_gtt_mgr_debug
> > +static const struct ttm_resource_manager_func xe_ttm_sys_mgr_func = {
> > +	.alloc = xe_ttm_sys_mgr_new,
> > +	.free = xe_ttm_sys_mgr_del,
> > +	.debug = xe_ttm_sys_mgr_debug
> >  };
> >  
> > -static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
> > +static void ttm_sys_mgr_fini(struct drm_device *drm, void *arg)
> >  {
> > -	struct xe_ttm_gtt_mgr *mgr = arg;
> > -	struct xe_device *xe = gt_to_xe(mgr->gt);
> > -	struct ttm_resource_manager *man = &mgr->manager;
> > +	struct xe_device *xe = (struct xe_device *)arg;
> > +	struct ttm_resource_manager *man = &xe->mem.sys_mgr;
> >  	int err;
> >  
> >  	ttm_resource_manager_set_used(man, false);
> > @@ -104,27 +96,18 @@ static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
> >  	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, NULL);
> >  }
> >  
> > -int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
> > -			u64 gtt_size)
> > +int xe_ttm_sys_mgr_init(struct xe_device *xe)
> >  {
> > -	struct xe_device *xe = gt_to_xe(gt);
> > -	struct ttm_resource_manager *man = &mgr->manager;
> > -	int err;
> > -
> > -	XE_BUG_ON(xe_gt_is_media_type(gt));
> > +	struct ttm_resource_manager *man = &xe->mem.sys_mgr;
> > +	struct sysinfo si;
> > +	u64 gtt_size;
> >  
> > -	mgr->gt = gt;
> > +	si_meminfo(&si);
> > +	gtt_size = (u64)si.totalram * si.mem_unit * 3/4;
> >  	man->use_tt = true;
> > -	man->func = &xe_ttm_gtt_mgr_func;
> > -
> > +	man->func = &xe_ttm_sys_mgr_func;
> >  	ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
> > -
> > -	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> > +	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, man);
> >  	ttm_resource_manager_set_used(man, true);
> > -
> > -	err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
> > -	if (err)
> > -		return err;
> > -
> > -	return 0;
> > +	return drmm_add_action_or_reset(&xe->drm, ttm_sys_mgr_fini, xe);
> >  }
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
  2023-04-03 22:20 Chang, Bruce
@ 2023-04-04  3:44 ` Matthew Brost
  2023-04-04 16:29   ` Rodrigo Vivi
  2023-04-04 18:48 ` Lucas De Marchi
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Brost @ 2023-04-04  3:44 UTC (permalink / raw)
  To: Chang, Bruce; +Cc: Stuart Summers, Vivi, intel-xe, Rodrigo

On Mon, Apr 03, 2023 at 10:20:31PM +0000, Chang, Bruce wrote:
> Currently, unload pvc driver will generate a null dereference
> and the call stack is as below.
> 
> [ 4850.618000] Call Trace:
> [ 4850.620740]  <TASK>
> [ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
> [ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
> [ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
> [ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
> [ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
> [ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
> [ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
> [ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
> [ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
> [ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
> [ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]
> 
> There are a couple issues, but the main one is due to TTM has only
> one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
> 1 TTM_PL_TT each tile. The second will overwrite the first one.
> 
> During unload time, the first tile will reset the TTM_PL_TT manger
> and when the second tile is trying to free Bo and it will generate
> the null reference since the TTM manage is already got reset to 0.
> 
> The fix is to use one global TTM_PL_TT manager.
> 
> v2: make gtt mgr global and change the name to sys_mgr
> 
> Cc: Stuart Summers <stuart.summers@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> ---
>  drivers/gpu/drm/xe/Makefile                   |  2 +-
>  drivers/gpu/drm/xe/xe_device.c                |  2 +
>  drivers/gpu/drm/xe/xe_device.h                |  1 +
>  drivers/gpu/drm/xe/xe_device_types.h          |  2 +
>  drivers/gpu/drm/xe/xe_gt.c                    | 18 -----
>  drivers/gpu/drm/xe/xe_gt_types.h              |  2 -
>  drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h           | 16 -----
>  drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h     | 18 -----
>  .../xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} | 67 +++++++------------
>  9 files changed, 31 insertions(+), 97 deletions(-)
>  delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
>  delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
>  rename drivers/gpu/drm/xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} (52%)
> 
> diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
> index 669c7b6f552c..7e19a15af3dc 100644
> --- a/drivers/gpu/drm/xe/Makefile
> +++ b/drivers/gpu/drm/xe/Makefile
> @@ -83,7 +83,7 @@ xe-y += xe_bb.o \
>  	xe_step.o \
>  	xe_sync.o \
>  	xe_trace.o \
> -	xe_ttm_gtt_mgr.o \
> +	xe_ttm_sys_mgr.o \
>  	xe_ttm_stolen_mgr.o \
>  	xe_ttm_vram_mgr.o \
>  	xe_tuning.o \
> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
> index a79f934e3d2d..4fb9aff27686 100644
> --- a/drivers/gpu/drm/xe/xe_device.c
> +++ b/drivers/gpu/drm/xe/xe_device.c
> @@ -279,6 +279,8 @@ int xe_device_probe(struct xe_device *xe)
>  	if (err)
>  		goto err_irq_shutdown;
>  
> +	xe_ttm_sys_mgr_init(xe);
> +
>  	for_each_gt(gt, xe, id) {
>  		err = xe_gt_init_noalloc(gt);
>  		if (err)
> diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
> index d277f8985f7b..d9d1b09a8e38 100644
> --- a/drivers/gpu/drm/xe/xe_device.h
> +++ b/drivers/gpu/drm/xe/xe_device.h
> @@ -116,4 +116,5 @@ static inline bool xe_device_has_flat_ccs(struct xe_device *xe)
>  }
>  
>  u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
> +int xe_ttm_sys_mgr_init(struct xe_device *xe);
>  #endif
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 88f863edc41c..6fb53a54913f 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -195,6 +195,8 @@ struct xe_device {
>  			/** @mapping: pointer to VRAM mappable space */
>  			void *__iomem mapping;
>  		} vram;
> +		/** @sys_mgr: system TTM manager */
> +		struct ttm_resource_manager sys_mgr;
>  	} mem;
>  
>  	/** @usm: unified memory state */
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index fd7a5b43ba3e..a6ea138a7877 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -36,7 +36,6 @@
>  #include "xe_ring_ops.h"
>  #include "xe_sa.h"
>  #include "xe_sched_job.h"
> -#include "xe_ttm_gtt_mgr.h"
>  #include "xe_ttm_vram_mgr.h"
>  #include "xe_tuning.h"
>  #include "xe_uc.h"
> @@ -77,16 +76,11 @@ int xe_gt_alloc(struct xe_device *xe, struct xe_gt *gt)
>  		if (!gt->mem.vram_mgr)
>  			return -ENOMEM;
>  
> -		gt->mem.gtt_mgr = drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
> -					       GFP_KERNEL);
> -		if (!gt->mem.gtt_mgr)
> -			return -ENOMEM;
>  	} else {
>  		struct xe_gt *full_gt = xe_find_full_gt(gt);
>  
>  		gt->mem.ggtt = full_gt->mem.ggtt;
>  		gt->mem.vram_mgr = full_gt->mem.vram_mgr;
> -		gt->mem.gtt_mgr = full_gt->mem.gtt_mgr;
>  	}
>  
>  	gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0);
> @@ -98,26 +92,14 @@ static int gt_ttm_mgr_init(struct xe_gt *gt)
>  {
>  	struct xe_device *xe = gt_to_xe(gt);
>  	int err;
> -	struct sysinfo si;
> -	u64 gtt_size;
> -
> -	si_meminfo(&si);
> -	gtt_size = (u64)si.totalram * si.mem_unit * 3/4;
>  
>  	if (gt->mem.vram.size) {
>  		err = xe_ttm_vram_mgr_init(gt, gt->mem.vram_mgr);
>  		if (err)
>  			return err;
> -		gtt_size = min(max((XE_DEFAULT_GTT_SIZE_MB << 20),
> -				   (u64)gt->mem.vram.size),
> -			       gtt_size);
>  		xe->info.mem_region_mask |= BIT(gt->info.vram_id) << 1;
>  	}
>  
> -	err = xe_ttm_gtt_mgr_init(gt, gt->mem.gtt_mgr, gtt_size);
> -	if (err)
> -		return err;
> -
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
> index 8f29aba455e0..9d3117fad2e4 100644
> --- a/drivers/gpu/drm/xe/xe_gt_types.h
> +++ b/drivers/gpu/drm/xe/xe_gt_types.h
> @@ -162,8 +162,6 @@ struct xe_gt {
>  		} vram;
>  		/** @vram_mgr: VRAM TTM manager */
>  		struct xe_ttm_vram_mgr *vram_mgr;
> -		/** @gtt_mr: GTT TTM manager */
> -		struct xe_ttm_gtt_mgr *gtt_mgr;
>  		/** @ggtt: Global graphics translation table */
>  		struct xe_ggtt *ggtt;
>  	} mem;
> diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
> deleted file mode 100644
> index d1d57cb9c2b8..000000000000
> --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
> +++ /dev/null
> @@ -1,16 +0,0 @@
> -/* SPDX-License-Identifier: MIT */
> -/*
> - * Copyright © 2022 Intel Corporation
> - */
> -
> -#ifndef _XE_TTGM_GTT_MGR_H_
> -#define _XE_TTGM_GTT_MGR_H_
> -
> -#include "xe_ttm_gtt_mgr_types.h"
> -
> -struct xe_gt;
> -
> -int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
> -			u64 gtt_size);
> -
> -#endif
> diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
> deleted file mode 100644
> index c66737488326..000000000000
> --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
> +++ /dev/null
> @@ -1,18 +0,0 @@
> -/* SPDX-License-Identifier: MIT */
> -/*
> - * Copyright © 2022 Intel Corporation
> - */
> -
> -#ifndef _XE_TTM_GTT_MGR_TYPES_H_
> -#define _XE_TTM_GTT_MGR_TYPES_H_
> -
> -#include <drm/ttm/ttm_device.h>
> -
> -struct xe_gt;
> -
> -struct xe_ttm_gtt_mgr {
> -	struct xe_gt *gt;
> -	struct ttm_resource_manager manager;
> -};
> -
> -#endif
> diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> similarity index 52%
> rename from drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> rename to drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> index 8075781070f2..cf5f4f73d4dc 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
> @@ -12,31 +12,24 @@
>  
>  #include "xe_bo.h"
>  #include "xe_gt.h"
> -#include "xe_ttm_gtt_mgr.h"
>  
> -struct xe_ttm_gtt_node {
> +struct xe_ttm_sys_node {
>  	struct ttm_buffer_object *tbo;
>  	struct ttm_range_mgr_node base;
>  };
>  
> -static inline struct xe_ttm_gtt_mgr *
> -to_gtt_mgr(struct ttm_resource_manager *man)
> +static inline struct xe_ttm_sys_node *
> +to_xe_ttm_sys_node(struct ttm_resource *res)
>  {
> -	return container_of(man, struct xe_ttm_gtt_mgr, manager);
> +	return container_of(res, struct xe_ttm_sys_node, base.base);
>  }
>  
> -static inline struct xe_ttm_gtt_node *
> -to_xe_ttm_gtt_node(struct ttm_resource *res)
> -{
> -	return container_of(res, struct xe_ttm_gtt_node, base.base);
> -}
> -
> -static int xe_ttm_gtt_mgr_new(struct ttm_resource_manager *man,
> +static int xe_ttm_sys_mgr_new(struct ttm_resource_manager *man,
>  			      struct ttm_buffer_object *tbo,
>  			      const struct ttm_place *place,
>  			      struct ttm_resource **res)
>  {
> -	struct xe_ttm_gtt_node *node;
> +	struct xe_ttm_sys_node *node;
>  	int r;
>  
>  	node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL);
> @@ -66,32 +59,31 @@ static int xe_ttm_gtt_mgr_new(struct ttm_resource_manager *man,
>  	return r;
>  }
>  
> -static void xe_ttm_gtt_mgr_del(struct ttm_resource_manager *man,
> +static void xe_ttm_sys_mgr_del(struct ttm_resource_manager *man,
>  			       struct ttm_resource *res)
>  {
> -	struct xe_ttm_gtt_node *node = to_xe_ttm_gtt_node(res);
> +	struct xe_ttm_sys_node *node = to_xe_ttm_sys_node(res);
>  
>  	ttm_resource_fini(man, res);
>  	kfree(node);
>  }
>  
> -static void xe_ttm_gtt_mgr_debug(struct ttm_resource_manager *man,
> +static void xe_ttm_sys_mgr_debug(struct ttm_resource_manager *man,
>  				 struct drm_printer *printer)
>  {
>  
>  }
>  
> -static const struct ttm_resource_manager_func xe_ttm_gtt_mgr_func = {
> -	.alloc = xe_ttm_gtt_mgr_new,
> -	.free = xe_ttm_gtt_mgr_del,
> -	.debug = xe_ttm_gtt_mgr_debug
> +static const struct ttm_resource_manager_func xe_ttm_sys_mgr_func = {
> +	.alloc = xe_ttm_sys_mgr_new,
> +	.free = xe_ttm_sys_mgr_del,
> +	.debug = xe_ttm_sys_mgr_debug
>  };
>  
> -static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
> +static void ttm_sys_mgr_fini(struct drm_device *drm, void *arg)
>  {
> -	struct xe_ttm_gtt_mgr *mgr = arg;
> -	struct xe_device *xe = gt_to_xe(mgr->gt);
> -	struct ttm_resource_manager *man = &mgr->manager;
> +	struct xe_device *xe = (struct xe_device *)arg;
> +	struct ttm_resource_manager *man = &xe->mem.sys_mgr;
>  	int err;
>  
>  	ttm_resource_manager_set_used(man, false);
> @@ -104,27 +96,18 @@ static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
>  	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, NULL);
>  }
>  
> -int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
> -			u64 gtt_size)
> +int xe_ttm_sys_mgr_init(struct xe_device *xe)
>  {
> -	struct xe_device *xe = gt_to_xe(gt);
> -	struct ttm_resource_manager *man = &mgr->manager;
> -	int err;
> -
> -	XE_BUG_ON(xe_gt_is_media_type(gt));
> +	struct ttm_resource_manager *man = &xe->mem.sys_mgr;
> +	struct sysinfo si;
> +	u64 gtt_size;
>  
> -	mgr->gt = gt;
> +	si_meminfo(&si);
> +	gtt_size = (u64)si.totalram * si.mem_unit * 3/4;
>  	man->use_tt = true;
> -	man->func = &xe_ttm_gtt_mgr_func;
> -
> +	man->func = &xe_ttm_sys_mgr_func;
>  	ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
> -
> -	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> +	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, man);
>  	ttm_resource_manager_set_used(man, true);
> -
> -	err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
> -	if (err)
> -		return err;
> -
> -	return 0;
> +	return drmm_add_action_or_reset(&xe->drm, ttm_sys_mgr_fini, xe);
>  }
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
@ 2023-04-03 22:20 Chang, Bruce
  2023-04-04  3:44 ` Matthew Brost
  2023-04-04 18:48 ` Lucas De Marchi
  0 siblings, 2 replies; 11+ messages in thread
From: Chang, Bruce @ 2023-04-03 22:20 UTC (permalink / raw)
  To: intel-xe; +Cc: Stuart Summers, Rodrigo, Chang, Bruce, Vivi

Currently, unload pvc driver will generate a null dereference
and the call stack is as below.

[ 4850.618000] Call Trace:
[ 4850.620740]  <TASK>
[ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
[ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
[ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
[ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
[ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
[ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
[ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
[ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
[ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
[ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
[ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]

There are a couple issues, but the main one is due to TTM has only
one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
1 TTM_PL_TT each tile. The second will overwrite the first one.

During unload time, the first tile will reset the TTM_PL_TT manger
and when the second tile is trying to free Bo and it will generate
the null reference since the TTM manage is already got reset to 0.

The fix is to use one global TTM_PL_TT manager.

v2: make gtt mgr global and change the name to sys_mgr

Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
---
 drivers/gpu/drm/xe/Makefile                   |  2 +-
 drivers/gpu/drm/xe/xe_device.c                |  2 +
 drivers/gpu/drm/xe/xe_device.h                |  1 +
 drivers/gpu/drm/xe/xe_device_types.h          |  2 +
 drivers/gpu/drm/xe/xe_gt.c                    | 18 -----
 drivers/gpu/drm/xe/xe_gt_types.h              |  2 -
 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h           | 16 -----
 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h     | 18 -----
 .../xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} | 67 +++++++------------
 9 files changed, 31 insertions(+), 97 deletions(-)
 delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
 delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
 rename drivers/gpu/drm/xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} (52%)

diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
index 669c7b6f552c..7e19a15af3dc 100644
--- a/drivers/gpu/drm/xe/Makefile
+++ b/drivers/gpu/drm/xe/Makefile
@@ -83,7 +83,7 @@ xe-y += xe_bb.o \
 	xe_step.o \
 	xe_sync.o \
 	xe_trace.o \
-	xe_ttm_gtt_mgr.o \
+	xe_ttm_sys_mgr.o \
 	xe_ttm_stolen_mgr.o \
 	xe_ttm_vram_mgr.o \
 	xe_tuning.o \
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
index a79f934e3d2d..4fb9aff27686 100644
--- a/drivers/gpu/drm/xe/xe_device.c
+++ b/drivers/gpu/drm/xe/xe_device.c
@@ -279,6 +279,8 @@ int xe_device_probe(struct xe_device *xe)
 	if (err)
 		goto err_irq_shutdown;
 
+	xe_ttm_sys_mgr_init(xe);
+
 	for_each_gt(gt, xe, id) {
 		err = xe_gt_init_noalloc(gt);
 		if (err)
diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index d277f8985f7b..d9d1b09a8e38 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -116,4 +116,5 @@ static inline bool xe_device_has_flat_ccs(struct xe_device *xe)
 }
 
 u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
+int xe_ttm_sys_mgr_init(struct xe_device *xe);
 #endif
diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
index 88f863edc41c..6fb53a54913f 100644
--- a/drivers/gpu/drm/xe/xe_device_types.h
+++ b/drivers/gpu/drm/xe/xe_device_types.h
@@ -195,6 +195,8 @@ struct xe_device {
 			/** @mapping: pointer to VRAM mappable space */
 			void *__iomem mapping;
 		} vram;
+		/** @sys_mgr: system TTM manager */
+		struct ttm_resource_manager sys_mgr;
 	} mem;
 
 	/** @usm: unified memory state */
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index fd7a5b43ba3e..a6ea138a7877 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -36,7 +36,6 @@
 #include "xe_ring_ops.h"
 #include "xe_sa.h"
 #include "xe_sched_job.h"
-#include "xe_ttm_gtt_mgr.h"
 #include "xe_ttm_vram_mgr.h"
 #include "xe_tuning.h"
 #include "xe_uc.h"
@@ -77,16 +76,11 @@ int xe_gt_alloc(struct xe_device *xe, struct xe_gt *gt)
 		if (!gt->mem.vram_mgr)
 			return -ENOMEM;
 
-		gt->mem.gtt_mgr = drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
-					       GFP_KERNEL);
-		if (!gt->mem.gtt_mgr)
-			return -ENOMEM;
 	} else {
 		struct xe_gt *full_gt = xe_find_full_gt(gt);
 
 		gt->mem.ggtt = full_gt->mem.ggtt;
 		gt->mem.vram_mgr = full_gt->mem.vram_mgr;
-		gt->mem.gtt_mgr = full_gt->mem.gtt_mgr;
 	}
 
 	gt->ordered_wq = alloc_ordered_workqueue("gt-ordered-wq", 0);
@@ -98,26 +92,14 @@ static int gt_ttm_mgr_init(struct xe_gt *gt)
 {
 	struct xe_device *xe = gt_to_xe(gt);
 	int err;
-	struct sysinfo si;
-	u64 gtt_size;
-
-	si_meminfo(&si);
-	gtt_size = (u64)si.totalram * si.mem_unit * 3/4;
 
 	if (gt->mem.vram.size) {
 		err = xe_ttm_vram_mgr_init(gt, gt->mem.vram_mgr);
 		if (err)
 			return err;
-		gtt_size = min(max((XE_DEFAULT_GTT_SIZE_MB << 20),
-				   (u64)gt->mem.vram.size),
-			       gtt_size);
 		xe->info.mem_region_mask |= BIT(gt->info.vram_id) << 1;
 	}
 
-	err = xe_ttm_gtt_mgr_init(gt, gt->mem.gtt_mgr, gtt_size);
-	if (err)
-		return err;
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/xe/xe_gt_types.h b/drivers/gpu/drm/xe/xe_gt_types.h
index 8f29aba455e0..9d3117fad2e4 100644
--- a/drivers/gpu/drm/xe/xe_gt_types.h
+++ b/drivers/gpu/drm/xe/xe_gt_types.h
@@ -162,8 +162,6 @@ struct xe_gt {
 		} vram;
 		/** @vram_mgr: VRAM TTM manager */
 		struct xe_ttm_vram_mgr *vram_mgr;
-		/** @gtt_mr: GTT TTM manager */
-		struct xe_ttm_gtt_mgr *gtt_mgr;
 		/** @ggtt: Global graphics translation table */
 		struct xe_ggtt *ggtt;
 	} mem;
diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
deleted file mode 100644
index d1d57cb9c2b8..000000000000
--- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
+++ /dev/null
@@ -1,16 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright © 2022 Intel Corporation
- */
-
-#ifndef _XE_TTGM_GTT_MGR_H_
-#define _XE_TTGM_GTT_MGR_H_
-
-#include "xe_ttm_gtt_mgr_types.h"
-
-struct xe_gt;
-
-int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
-			u64 gtt_size);
-
-#endif
diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
deleted file mode 100644
index c66737488326..000000000000
--- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
+++ /dev/null
@@ -1,18 +0,0 @@
-/* SPDX-License-Identifier: MIT */
-/*
- * Copyright © 2022 Intel Corporation
- */
-
-#ifndef _XE_TTM_GTT_MGR_TYPES_H_
-#define _XE_TTM_GTT_MGR_TYPES_H_
-
-#include <drm/ttm/ttm_device.h>
-
-struct xe_gt;
-
-struct xe_ttm_gtt_mgr {
-	struct xe_gt *gt;
-	struct ttm_resource_manager manager;
-};
-
-#endif
diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
similarity index 52%
rename from drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
rename to drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
index 8075781070f2..cf5f4f73d4dc 100644
--- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
+++ b/drivers/gpu/drm/xe/xe_ttm_sys_mgr.c
@@ -12,31 +12,24 @@
 
 #include "xe_bo.h"
 #include "xe_gt.h"
-#include "xe_ttm_gtt_mgr.h"
 
-struct xe_ttm_gtt_node {
+struct xe_ttm_sys_node {
 	struct ttm_buffer_object *tbo;
 	struct ttm_range_mgr_node base;
 };
 
-static inline struct xe_ttm_gtt_mgr *
-to_gtt_mgr(struct ttm_resource_manager *man)
+static inline struct xe_ttm_sys_node *
+to_xe_ttm_sys_node(struct ttm_resource *res)
 {
-	return container_of(man, struct xe_ttm_gtt_mgr, manager);
+	return container_of(res, struct xe_ttm_sys_node, base.base);
 }
 
-static inline struct xe_ttm_gtt_node *
-to_xe_ttm_gtt_node(struct ttm_resource *res)
-{
-	return container_of(res, struct xe_ttm_gtt_node, base.base);
-}
-
-static int xe_ttm_gtt_mgr_new(struct ttm_resource_manager *man,
+static int xe_ttm_sys_mgr_new(struct ttm_resource_manager *man,
 			      struct ttm_buffer_object *tbo,
 			      const struct ttm_place *place,
 			      struct ttm_resource **res)
 {
-	struct xe_ttm_gtt_node *node;
+	struct xe_ttm_sys_node *node;
 	int r;
 
 	node = kzalloc(struct_size(node, base.mm_nodes, 1), GFP_KERNEL);
@@ -66,32 +59,31 @@ static int xe_ttm_gtt_mgr_new(struct ttm_resource_manager *man,
 	return r;
 }
 
-static void xe_ttm_gtt_mgr_del(struct ttm_resource_manager *man,
+static void xe_ttm_sys_mgr_del(struct ttm_resource_manager *man,
 			       struct ttm_resource *res)
 {
-	struct xe_ttm_gtt_node *node = to_xe_ttm_gtt_node(res);
+	struct xe_ttm_sys_node *node = to_xe_ttm_sys_node(res);
 
 	ttm_resource_fini(man, res);
 	kfree(node);
 }
 
-static void xe_ttm_gtt_mgr_debug(struct ttm_resource_manager *man,
+static void xe_ttm_sys_mgr_debug(struct ttm_resource_manager *man,
 				 struct drm_printer *printer)
 {
 
 }
 
-static const struct ttm_resource_manager_func xe_ttm_gtt_mgr_func = {
-	.alloc = xe_ttm_gtt_mgr_new,
-	.free = xe_ttm_gtt_mgr_del,
-	.debug = xe_ttm_gtt_mgr_debug
+static const struct ttm_resource_manager_func xe_ttm_sys_mgr_func = {
+	.alloc = xe_ttm_sys_mgr_new,
+	.free = xe_ttm_sys_mgr_del,
+	.debug = xe_ttm_sys_mgr_debug
 };
 
-static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
+static void ttm_sys_mgr_fini(struct drm_device *drm, void *arg)
 {
-	struct xe_ttm_gtt_mgr *mgr = arg;
-	struct xe_device *xe = gt_to_xe(mgr->gt);
-	struct ttm_resource_manager *man = &mgr->manager;
+	struct xe_device *xe = (struct xe_device *)arg;
+	struct ttm_resource_manager *man = &xe->mem.sys_mgr;
 	int err;
 
 	ttm_resource_manager_set_used(man, false);
@@ -104,27 +96,18 @@ static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
 	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, NULL);
 }
 
-int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
-			u64 gtt_size)
+int xe_ttm_sys_mgr_init(struct xe_device *xe)
 {
-	struct xe_device *xe = gt_to_xe(gt);
-	struct ttm_resource_manager *man = &mgr->manager;
-	int err;
-
-	XE_BUG_ON(xe_gt_is_media_type(gt));
+	struct ttm_resource_manager *man = &xe->mem.sys_mgr;
+	struct sysinfo si;
+	u64 gtt_size;
 
-	mgr->gt = gt;
+	si_meminfo(&si);
+	gtt_size = (u64)si.totalram * si.mem_unit * 3/4;
 	man->use_tt = true;
-	man->func = &xe_ttm_gtt_mgr_func;
-
+	man->func = &xe_ttm_sys_mgr_func;
 	ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
-
-	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
+	ttm_set_driver_manager(&xe->ttm, XE_PL_TT, man);
 	ttm_resource_manager_set_used(man, true);
-
-	err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
-	if (err)
-		return err;
-
-	return 0;
+	return drmm_add_action_or_reset(&xe->drm, ttm_sys_mgr_fini, xe);
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-04-04 20:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-30 15:40 [Intel-xe] [PATCH] drm/xe: fix pvc unload issue Chang, Bruce
2023-03-30 15:42 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-03-30 15:43 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
2023-03-31  5:40 ` [Intel-xe] [PATCH] " Matthew Auld
2023-03-31 22:55   ` Chang, Yu bruce
2023-04-03 22:20 Chang, Bruce
2023-04-04  3:44 ` Matthew Brost
2023-04-04 16:29   ` Rodrigo Vivi
2023-04-04 18:48 ` Lucas De Marchi
2023-04-04 19:45   ` Rodrigo Vivi
2023-04-04 20:22     ` Lucas De Marchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).