intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.william.auld@gmail.com>
To: "Chang, Bruce" <yu.bruce.chang@intel.com>
Cc: intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
Date: Fri, 31 Mar 2023 06:40:06 +0100	[thread overview]
Message-ID: <CAM0jSHPvzNKj9Z-sX6BykN4v6JQQJve4e=jYae-ZAEdmb8mPbA@mail.gmail.com> (raw)
In-Reply-To: <20230330154026.4282-1-yu.bruce.chang@intel.com>

On Thu, 30 Mar 2023 at 16:40, Chang, Bruce <yu.bruce.chang@intel.com> wrote:
>
> Currently, unload pvc driver will generate a null dereference
> and the call stack is as below.
>
> [ 4850.618000] Call Trace:
> [ 4850.620740]  <TASK>
> [ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
> [ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
> [ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
> [ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
> [ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
> [ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
> [ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
> [ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
> [ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
> [ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
> [ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]
>
> There are a couple issues, but the main one is due to TTM has only
> one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
> 1 TTM_PL_TT each tile. The second will overwrite the first one.
>
> During unload time, the first tile will reset the TTM_PL_TT manger
> and when the second tile is trying to free Bo and it will generate
> the null reference since the TTM manage is already got reset to 0.
>
> The fix is to share the TTM_PL_TT manager and use a count to only
> allow the last instance to release.

Would it make sense to rather move it out of GT init, and just have
one instance in the xe_device? And with that to rather consider
xe->mem.vram_size, instead of gt->mem.vram_size when considering the
potential "GTT" size? I think PL_TT is just system memory that can be
touched by the GPU device (likely has a gtt mapping), so not sure why
that needs to be GT centric.

>
> Cc: Stuart Summers <stuart.summers@intel.com>
> Cc: Matthew Brost <matthew.brost@intel.com>
> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> ---
>  drivers/gpu/drm/xe/xe_device_types.h |  3 +++
>  drivers/gpu/drm/xe/xe_gt.c           |  9 +++++++--
>  drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c  | 25 +++++++++++++++----------
>  3 files changed, 25 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_device_types.h b/drivers/gpu/drm/xe/xe_device_types.h
> index 88f863edc41c..ea771f120c0f 100644
> --- a/drivers/gpu/drm/xe/xe_device_types.h
> +++ b/drivers/gpu/drm/xe/xe_device_types.h
> @@ -195,6 +195,9 @@ struct xe_device {
>                         /** @mapping: pointer to VRAM mappable space */
>                         void *__iomem mapping;
>                 } vram;
> +               /** @gtt_mr: GTT TTM manager */
> +               struct xe_ttm_gtt_mgr *gtt_mgr;
> +               int instance;
>         } mem;
>
>         /** @usm: unified memory state */
> diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
> index fd7a5b43ba3e..e7ca141478ef 100644
> --- a/drivers/gpu/drm/xe/xe_gt.c
> +++ b/drivers/gpu/drm/xe/xe_gt.c
> @@ -77,8 +77,13 @@ int xe_gt_alloc(struct xe_device *xe, struct xe_gt *gt)
>                 if (!gt->mem.vram_mgr)
>                         return -ENOMEM;
>
> -               gt->mem.gtt_mgr = drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
> -                                              GFP_KERNEL);
> +               if (!xe->mem.gtt_mgr) {
> +                       xe->mem.gtt_mgr =
> +                               drmm_kzalloc(drm, sizeof(*gt->mem.gtt_mgr),
> +                                            GFP_KERNEL);
> +                       xe->mem.instance = 0;
> +               }
> +               gt->mem.gtt_mgr = xe->mem.gtt_mgr;
>                 if (!gt->mem.gtt_mgr)
>                         return -ENOMEM;
>         } else {
> diff --git a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> index 8075781070f2..05300c71928e 100644
> --- a/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> +++ b/drivers/gpu/drm/xe/xe_ttm_gtt_mgr.c
> @@ -94,6 +94,9 @@ static void ttm_gtt_mgr_fini(struct drm_device *drm, void *arg)
>         struct ttm_resource_manager *man = &mgr->manager;
>         int err;
>
> +       if (--xe->mem.instance)
> +               return;
> +
>         ttm_resource_manager_set_used(man, false);
>
>         err = ttm_resource_manager_evict_all(&xe->ttm, man);
> @@ -113,18 +116,20 @@ int xe_ttm_gtt_mgr_init(struct xe_gt *gt, struct xe_ttm_gtt_mgr *mgr,
>
>         XE_BUG_ON(xe_gt_is_media_type(gt));
>
> -       mgr->gt = gt;
> -       man->use_tt = true;
> -       man->func = &xe_ttm_gtt_mgr_func;
> -
> -       ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
> +       if (!xe->mem.instance) {
> +               mgr->gt = gt;
> +               man->use_tt = true;
> +               man->func = &xe_ttm_gtt_mgr_func;
>
> -       ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> -       ttm_resource_manager_set_used(man, true);
> +               ttm_resource_manager_init(man, &xe->ttm, gtt_size >> PAGE_SHIFT);
>
> -       err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
> -       if (err)
> -               return err;
> +               ttm_set_driver_manager(&xe->ttm, XE_PL_TT, &mgr->manager);
> +               ttm_resource_manager_set_used(man, true);
> +               err = drmm_add_action_or_reset(&xe->drm, ttm_gtt_mgr_fini, mgr);
> +               if (err)
> +                       return err;
> +       }
> +       xe->mem.instance ++;
>
>         return 0;
>  }
> --
> 2.25.1
>

  parent reply	other threads:[~2023-03-31  5:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-30 15:40 [Intel-xe] [PATCH] drm/xe: fix pvc unload issue Chang, Bruce
2023-03-30 15:42 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-03-30 15:43 ` [Intel-xe] ✗ CI.KUnit: failure " Patchwork
2023-03-31  5:40 ` Matthew Auld [this message]
2023-03-31 22:55   ` [Intel-xe] [PATCH] " Chang, Yu bruce
2023-04-03 22:20 Chang, Bruce
2023-04-04  3:44 ` Matthew Brost
2023-04-04 16:29   ` Rodrigo Vivi
2023-04-04 18:48 ` Lucas De Marchi
2023-04-04 19:45   ` Rodrigo Vivi
2023-04-04 20:22     ` Lucas De Marchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM0jSHPvzNKj9Z-sX6BykN4v6JQQJve4e=jYae-ZAEdmb8mPbA@mail.gmail.com' \
    --to=matthew.william.auld@gmail.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=yu.bruce.chang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).