intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Lucas De Marchi <lucas.demarchi@intel.com>
To: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>,
	"Chang, Bruce" <yu.bruce.chang@intel.com>,
	intel-xe@lists.freedesktop.org
Subject: Re: [Intel-xe] [PATCH] drm/xe: fix pvc unload issue
Date: Tue, 4 Apr 2023 13:22:49 -0700	[thread overview]
Message-ID: <20230404202249.t4rn476e7uv2767q@ldmartin-desk2.lan> (raw)
In-Reply-To: <ZCx+Qfwr7KSaKGiw@intel.com>

On Tue, Apr 04, 2023 at 03:45:05PM -0400, Rodrigo Vivi wrote:
>On Tue, Apr 04, 2023 at 11:48:39AM -0700, Lucas De Marchi wrote:
>> On Mon, Apr 03, 2023 at 10:20:31PM +0000, Chang, Bruce wrote:
>> > Currently, unload pvc driver will generate a null dereference
>> > and the call stack is as below.
>> >
>> > [ 4850.618000] Call Trace:
>> > [ 4850.620740]  <TASK>
>> > [ 4850.623134]  ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm]
>> > [ 4850.628661]  ttm_bo_release+0x154/0x2c0 [ttm]
>> > [ 4850.633317]  ? drm_buddy_fini+0x62/0x80 [drm_buddy]
>> > [ 4850.638487]  ? __kmem_cache_free+0x27d/0x2c0
>> > [ 4850.643054]  ttm_bo_put+0x38/0x60 [ttm]
>> > [ 4850.647190]  xe_gem_object_free+0x1f/0x30 [xe]
>> > [ 4850.651945]  drm_gem_object_free+0x1e/0x30 [drm]
>> > [ 4850.656904]  ggtt_fini_noalloc+0x9d/0xe0 [xe]
>> > [ 4850.661574]  drm_managed_release+0xb5/0x150 [drm]
>> > [ 4850.666617]  drm_dev_release+0x30/0x50 [drm]
>> > [ 4850.671209]  devm_drm_dev_init_release+0x3c/0x60 [drm]
>> >
>> > There are a couple issues, but the main one is due to TTM has only
>> > one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup
>> > 1 TTM_PL_TT each tile. The second will overwrite the first one.
>> >
>> > During unload time, the first tile will reset the TTM_PL_TT manger
>> > and when the second tile is trying to free Bo and it will generate
>> > the null reference since the TTM manage is already got reset to 0.
>> >
>> > The fix is to use one global TTM_PL_TT manager.
>> >
>> > v2: make gtt mgr global and change the name to sys_mgr
>> >
>> > Cc: Stuart Summers <stuart.summers@intel.com>
>> > Cc: Matthew Brost <matthew.brost@intel.com>
>> > Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com>
>> > Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
>> > ---
>> > drivers/gpu/drm/xe/Makefile                   |  2 +-
>> > drivers/gpu/drm/xe/xe_device.c                |  2 +
>> > drivers/gpu/drm/xe/xe_device.h                |  1 +
>> > drivers/gpu/drm/xe/xe_device_types.h          |  2 +
>> > drivers/gpu/drm/xe/xe_gt.c                    | 18 -----
>> > drivers/gpu/drm/xe/xe_gt_types.h              |  2 -
>> > drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h           | 16 -----
>> > drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h     | 18 -----
>> > .../xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} | 67 +++++++------------
>> > 9 files changed, 31 insertions(+), 97 deletions(-)
>> > delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr.h
>> > delete mode 100644 drivers/gpu/drm/xe/xe_ttm_gtt_mgr_types.h
>> > rename drivers/gpu/drm/xe/{xe_ttm_gtt_mgr.c => xe_ttm_sys_mgr.c} (52%)
>> >
>> > diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile
>> > index 669c7b6f552c..7e19a15af3dc 100644
>> > --- a/drivers/gpu/drm/xe/Makefile
>> > +++ b/drivers/gpu/drm/xe/Makefile
>> > @@ -83,7 +83,7 @@ xe-y += xe_bb.o \
>> > 	xe_step.o \
>> > 	xe_sync.o \
>> > 	xe_trace.o \
>> > -	xe_ttm_gtt_mgr.o \
>> > +	xe_ttm_sys_mgr.o \
>> > 	xe_ttm_stolen_mgr.o \
>> > 	xe_ttm_vram_mgr.o \
>> > 	xe_tuning.o \
>> > diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c
>> > index a79f934e3d2d..4fb9aff27686 100644
>> > --- a/drivers/gpu/drm/xe/xe_device.c
>> > +++ b/drivers/gpu/drm/xe/xe_device.c
>> > @@ -279,6 +279,8 @@ int xe_device_probe(struct xe_device *xe)
>> > 	if (err)
>> > 		goto err_irq_shutdown;
>> >
>> > +	xe_ttm_sys_mgr_init(xe);
>> > +
>> > 	for_each_gt(gt, xe, id) {
>> > 		err = xe_gt_init_noalloc(gt);
>> > 		if (err)
>> > diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
>> > index d277f8985f7b..d9d1b09a8e38 100644
>> > --- a/drivers/gpu/drm/xe/xe_device.h
>> > +++ b/drivers/gpu/drm/xe/xe_device.h
>> > @@ -116,4 +116,5 @@ static inline bool xe_device_has_flat_ccs(struct xe_device *xe)
>> > }
>> >
>> > u32 xe_device_ccs_bytes(struct xe_device *xe, u64 size);
>> > +int xe_ttm_sys_mgr_init(struct xe_device *xe);
>>
>>   CC [M]  drivers/gpu/drm/xe/tests/xe_bo_test.o
>> ../drivers/gpu/drm/xe/xe_ttm_sys_mgr.c:99:5: error: no previous prototype for ‘xe_ttm_sys_mgr_init’ [-Werror=missing-prototypes]
>>    99 | int xe_ttm_sys_mgr_init(struct xe_device *xe)
>>       |     ^~~~~~~~~~~~~~~~~~~
>>
>
>that's awkward... I didn't get issue here, not CI got it...

I enable W=1 every time I'm going to push so I get this kind of error.

We used to have it in the gitlab CI, but lost when migrated to the
mailing list.  I talked with Ryszard to setup CI to run
"dev-provided-checks" so we can enable this back and probably run some
more. Hopefully we can add it back soon.

FWIW these are the commands I use:

	alias m='make -j$(nproc)'
	alias m64='m O=build64'
	m64
	./scripts/config --file build64/.config --disable CONFIG_DRM_XE_DISPLAY && \
		m64 modules_prepare && m64 M=drivers/gpu/drm/xe W=1
	./scripts/config --file build64/.config --enable  CONFIG_DRM_XE_DISPLAY

which does the trick of "build everything, rebuild only xe.ko with
display disabled and W=1 (otherwise it fails), then move
CONFIG_DRM_XE_DISPLAY back on.

Lucas De Marchi

>
>>
>> couple of issues:
>>
>> 1) you moved the implementation to a new file and didn't include
>> xe_device.h, creating the build issue above
>
>I was wondering about gcc version, but this wouldn't be magic in older gcc...
>
>> 2) The decl shouldn't be here though. You need a xe_ttm_sys_mgr.h, just
>> like we have xe_ttm_stolen_mgr.h, xe_ttm_vram_mgr.h, xe_ttm_vram_mgr_types.h
>
>yeap, that sounds the way to go...
>
>>
>> Lucas De Marchi

  reply	other threads:[~2023-04-04 20:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-03 22:20 [Intel-xe] [PATCH] drm/xe: fix pvc unload issue Chang, Bruce
2023-04-03 22:31 ` [Intel-xe] ✓ CI.Patch_applied: success for drm/xe: fix pvc unload issue (rev2) Patchwork
2023-04-03 22:32 ` [Intel-xe] ✓ CI.KUnit: " Patchwork
2023-04-03 22:36 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-04-03 22:57 ` [Intel-xe] ○ CI.BAT: info " Patchwork
2023-04-04  3:44 ` [Intel-xe] [PATCH] drm/xe: fix pvc unload issue Matthew Brost
2023-04-04 16:29   ` Rodrigo Vivi
2023-04-04 18:48 ` Lucas De Marchi
2023-04-04 19:45   ` Rodrigo Vivi
2023-04-04 20:22     ` Lucas De Marchi [this message]
  -- strict thread matches above, loose matches on Subject: below --
2023-03-30 15:40 Chang, Bruce
2023-03-31  5:40 ` Matthew Auld
2023-03-31 22:55   ` Chang, Yu bruce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230404202249.t4rn476e7uv2767q@ldmartin-desk2.lan \
    --to=lucas.demarchi@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=stuart.summers@intel.com \
    --cc=yu.bruce.chang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).