All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Roper <matthew.d.roper@intel.com>
To: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [PATCH RESEND] drm/i915: Flush buffer pools on driver remove
Date: Wed, 22 Sep 2021 15:24:29 -0700	[thread overview]
Message-ID: <20210922222429.GY3389343@mdroper-desk1.amr.corp.intel.com> (raw)
In-Reply-To: <20210903142320.216705-1-janusz.krzysztofik@linux.intel.com>

On Fri, Sep 03, 2021 at 04:23:20PM +0200, Janusz Krzysztofik wrote:
> In preparation for clean driver release, attempts to drain work queues
> and release freed objects are taken at driver remove time.  However, GT
> buffer pools are now not flushed before the driver release phase.
> Since unused objects may stay there for up to one second, some may
> survive until driver release is attempted.  That can potentially
> explain sporadic then hardly reproducible issues observed at driver
> release time, like non-zero shrink counter or outstanding address space

So just to make sure I'm understanding the description here:
 - We currently do an explicit flush of the buffer pools within the call
   path of drm_driver.release(); this removes all buffers, regardless of
   their age.
 - However there may be other code that runs *earlier* within the
   drm_driver.release() call chain that expects buffer pools have
   already been flushed and are already empty.
 - Since buffer pools auto-flush old buffers once per second in a worker
   thread, there's a small window where if we remove the driver while
   there are still buffers with an age of less than one second, the
   assumptions of the other release code may be violated.

So by moving the flush to driver remove (which executes earlier via the
pci_driver.remove() flow) you're ensuring that all buffers are flushed
before _any_ code in drm_driver.release() executes.

I found the wording of the commit message here somewhat confusing since
it's talking about flushes we do in driver release, but mentions
problems that arise during driver release due to lack of flushing.  You
might want to reword the commit message somewhat to help clarify.
Otherwise, the code change itself looks reasonable to me.

BTW, I do notice that drm_driver.release() in general is technically
deprecated at this point (with a suggestion in the drm_drv.h comments to
switch to using drmm_add_action(), drmm_kmalloc(), etc. to manage the
cleanup of resources).  At some point in the future me may want to
rework the i915 cleanup in general according to that guidance.


Matt

> areas.
> 
> Flush buffer pools on GT remove as a fix.  On driver release, don't
> flush the pools again, just assert that the flush was called and
> nothing added more in between.
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> Resending with Cc: dri-devel@lists.freedesktop.org as requested, and a
> typo in commit description fixed.
> 
> Thanks,
> Janusz
> 
>  drivers/gpu/drm/i915/gt/intel_gt.c             | 2 ++
>  drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c | 2 --
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 62d40c986642..8f322a4ecd87 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -737,6 +737,8 @@ void intel_gt_driver_remove(struct intel_gt *gt)
>  	intel_uc_driver_remove(&gt->uc);
>  
>  	intel_engines_release(gt);
> +
> +	intel_gt_flush_buffer_pool(gt);
>  }
>  
>  void intel_gt_driver_unregister(struct intel_gt *gt)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> index aa0a59c5b614..acc49c56a9f3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> @@ -245,8 +245,6 @@ void intel_gt_fini_buffer_pool(struct intel_gt *gt)
>  	struct intel_gt_buffer_pool *pool = &gt->buffer_pool;
>  	int n;
>  
> -	intel_gt_flush_buffer_pool(gt);
> -
>  	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
>  		GEM_BUG_ON(!list_empty(&pool->cache_list[n]));
>  }
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795

WARNING: multiple messages have this Message-ID (diff)
From: Matt Roper <matthew.d.roper@intel.com>
To: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Chris Wilson <chris@chris-wilson.co.uk>
Subject: Re: [Intel-gfx] [PATCH RESEND] drm/i915: Flush buffer pools on driver remove
Date: Wed, 22 Sep 2021 15:24:29 -0700	[thread overview]
Message-ID: <20210922222429.GY3389343@mdroper-desk1.amr.corp.intel.com> (raw)
In-Reply-To: <20210903142320.216705-1-janusz.krzysztofik@linux.intel.com>

On Fri, Sep 03, 2021 at 04:23:20PM +0200, Janusz Krzysztofik wrote:
> In preparation for clean driver release, attempts to drain work queues
> and release freed objects are taken at driver remove time.  However, GT
> buffer pools are now not flushed before the driver release phase.
> Since unused objects may stay there for up to one second, some may
> survive until driver release is attempted.  That can potentially
> explain sporadic then hardly reproducible issues observed at driver
> release time, like non-zero shrink counter or outstanding address space

So just to make sure I'm understanding the description here:
 - We currently do an explicit flush of the buffer pools within the call
   path of drm_driver.release(); this removes all buffers, regardless of
   their age.
 - However there may be other code that runs *earlier* within the
   drm_driver.release() call chain that expects buffer pools have
   already been flushed and are already empty.
 - Since buffer pools auto-flush old buffers once per second in a worker
   thread, there's a small window where if we remove the driver while
   there are still buffers with an age of less than one second, the
   assumptions of the other release code may be violated.

So by moving the flush to driver remove (which executes earlier via the
pci_driver.remove() flow) you're ensuring that all buffers are flushed
before _any_ code in drm_driver.release() executes.

I found the wording of the commit message here somewhat confusing since
it's talking about flushes we do in driver release, but mentions
problems that arise during driver release due to lack of flushing.  You
might want to reword the commit message somewhat to help clarify.
Otherwise, the code change itself looks reasonable to me.

BTW, I do notice that drm_driver.release() in general is technically
deprecated at this point (with a suggestion in the drm_drv.h comments to
switch to using drmm_add_action(), drmm_kmalloc(), etc. to manage the
cleanup of resources).  At some point in the future me may want to
rework the i915 cleanup in general according to that guidance.


Matt

> areas.
> 
> Flush buffer pools on GT remove as a fix.  On driver release, don't
> flush the pools again, just assert that the flush was called and
> nothing added more in between.
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> Resending with Cc: dri-devel@lists.freedesktop.org as requested, and a
> typo in commit description fixed.
> 
> Thanks,
> Janusz
> 
>  drivers/gpu/drm/i915/gt/intel_gt.c             | 2 ++
>  drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c | 2 --
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c
> index 62d40c986642..8f322a4ecd87 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt.c
> @@ -737,6 +737,8 @@ void intel_gt_driver_remove(struct intel_gt *gt)
>  	intel_uc_driver_remove(&gt->uc);
>  
>  	intel_engines_release(gt);
> +
> +	intel_gt_flush_buffer_pool(gt);
>  }
>  
>  void intel_gt_driver_unregister(struct intel_gt *gt)
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> index aa0a59c5b614..acc49c56a9f3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c
> @@ -245,8 +245,6 @@ void intel_gt_fini_buffer_pool(struct intel_gt *gt)
>  	struct intel_gt_buffer_pool *pool = &gt->buffer_pool;
>  	int n;
>  
> -	intel_gt_flush_buffer_pool(gt);
> -
>  	for (n = 0; n < ARRAY_SIZE(pool->cache_list); n++)
>  		GEM_BUG_ON(!list_empty(&pool->cache_list[n]));
>  }
> -- 
> 2.25.1
> 

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795

  parent reply	other threads:[~2021-09-22 22:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-03 14:23 [PATCH RESEND] drm/i915: Flush buffer pools on driver remove Janusz Krzysztofik
2021-09-03 14:23 ` [Intel-gfx] " Janusz Krzysztofik
2021-09-03 15:13 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Flush buffer pools on driver remove (rev3) Patchwork
2021-09-03 16:39 ` [Intel-gfx] ✓ Fi.CI.IGT: " Patchwork
2021-09-22 22:24 ` Matt Roper [this message]
2021-09-22 22:24   ` [Intel-gfx] [PATCH RESEND] drm/i915: Flush buffer pools on driver remove Matt Roper
2021-09-23 13:07   ` Janusz Krzysztofik
2021-09-23 13:07     ` [Intel-gfx] " Janusz Krzysztofik
2021-09-23 13:51     ` Matt Roper
2021-09-23 13:51       ` [Intel-gfx] " Matt Roper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210922222429.GY3389343@mdroper-desk1.amr.corp.intel.com \
    --to=matthew.d.roper@intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=janusz.krzysztofik@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.