From: Matthew Auld <matthew.william.auld@gmail.com>
To: "Chang, Bruce" <yu.bruce.chang@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
Date: Fri, 31 Mar 2023 06:40:06 +0100 [thread overview]
Message-ID: <CAM0jSHPvzNKj9Z-sX6BykN4v6JQQJve4e=jYae-ZAEdmb8mPbA@mail.gmail.com> (raw)
In-Reply-To: <20230330154026.4282-1-yu.bruce.chang@intel.com>
On Thu, 30 Mar 2023 at 16:40, Chang, Bruce <yu.bruce.chang@intel.com> wrote:
>
> Currently, unload pvc driver will generate a null dereference
> and the call stack is as below.
>
> [ 4850.618000] Call Trace:
> [ 4850.620740] <TASK>
> [ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
> [ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm]
> [ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy]
> [ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0
> [ 4850.643054] ttm_bo_put+0x38/0x60 [ttm]
> [ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe]
> [ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm]
> [ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe]
> [ 4850.661574] drm_managed_release+0xb5/0x150 [drm]
> [ 4850.666617] drm_dev_release+0x30/0x50 [drm]
> [ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm]
>
> There are a couple issues, but the main one is due to TTM has only
> one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
> 1 TTM_PL_TT each tile. The second will overwrite the first one.
>
> During unload time, the first tile will reset the TTM_PL_TT manger
> and when the second tile is trying to free Bo and it will generate
> the null reference since the TTM manage is already got reset to 0.
>
> The fix is to share the TTM_PL_TT manager and use a count to only
> allow the last instance to release.
Would it make sense to rather move it out of GT init, and just have
one instance in the xe_device? And with that to rather consider
xe->mem.vram_size, instead of gt->mem.vram_size when considering the
potential "GTT" size? I think PL_TT is just system memory that can be
touched by the GPU device (likely has a gtt mapping), so not sure why
that needs to be GT centric.
>
> Cc: Stuart Summers <stuart.summers@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> ---
> drivers/gpu/drm/xe/xe_device_types.h | 3 +++
> drivers/gpu/drm/xe/xe_gt.c | 9 +++++++--
> drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c | 25 +++++++++++++++----------
> 3 files changed, 25 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 88f863edc41c..ea771f120c0f 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -195,6 +195,9 @@ struct xe_device {
> /** @mapping: pointer to VRAM mappable space */
> void *__iomem mapping;
> } vram;
> + /** @gtt_mr: GTT TTM manager */
> + struct xe_ttm_gtt_mgr *gtt_mgr;
> + int instance;
> } mem;
>
> /** @usm: unified memory state */
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index fd7a5b43ba3e..e7ca141478ef 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -77,8 +77,13 @@ int xe_gt_alloc(struct xe_device *xe, struct xe_gt *gt)
> if (!gt->mem.vram_mgr)
> return -ENOMEM;
>
> - gt->mem.gtt_mgr = drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
> - GFP_KERNEL);
> + if (!xe->mem.gtt_mgr) {
> + xe->mem.gtt_mgr =
> + drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
> + GFP_KERNEL);
> + xe->mem.instance = 0;
> + }
> + gt->mem.gtt_mgr = xe->mem.gtt_mgr;
> if (!gt->mem.gtt_mgr)
> return -ENOMEM;
> } else {
> diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> index 8075781070f2..05300c71928e 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> @@ -94,6 +94,9 @@ static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
> struct ttm_resource_manager *man = &mgr->manager;
> int err;
>
> + if (--xe->mem.instance)
> + return;
> +
> ttm_resource_manager_set_used(man, false);
>
> err = ttm_resource_manager_evict_all(&xe->ttm, man);
> @@ -113,18 +116,20 @@ int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
>
> XE_BUG_ON(xe_gt_is_media_type(gt));
>
> - mgr->gt = gt;
> - man->use_tt = true;
> - man->func = &xe_ttm_gtt_mgr_func;
> -
> - ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
> + if (!xe->mem.instance) {
> + mgr->gt = gt;
> + man->use_tt = true;
> + man->func = &xe_ttm_gtt_mgr_func;
>
> - ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> - ttm_resource_manager_set_used(man, true);
> + ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
>
> - err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
> - if (err)
> - return err;
> + ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> + ttm_resource_manager_set_used(man, true);
> + err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
> + if (err)
> + return err;
> + }
> + xe->mem.instance ++;
>
> return 0;
> }
> --
> 2.25.1
>
next prev parent reply other threads:[~2023-03-31 5:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-30 15:40 [Intel-xe] [PATCH] drm/xe: fix pvc unload issue Chang, Bruce
2023-03-30 15:42 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-03-30 15:43 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
2023-03-31 5:40 ` Matthew Auld [this message]
2023-03-31 22:55 ` [Intel-xe] [PATCH] " Chang, Yu bruce
2023-04-03 22:20 Chang, Bruce
2023-04-04 3:44 ` Matthew Brost
2023-04-04 16:29 ` Rodrigo Vivi
2023-04-04 18:48 ` Lucas De Marchi
2023-04-04 19:45 ` Rodrigo Vivi
2023-04-04 20:22 ` Lucas De Marchi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM0jSHPvzNKj9Z-sX6BykN4v6JQQJve4e=jYae-ZAEdmb8mPbA@mail.gmail.com' \
--to=matthew.william.auld@gmail.com \
--cc=intel-xe@lists.freedesktop.org \
--cc=yu.bruce.chang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).