[Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines
@ 2020-02-05 12:11 Chris Wilson
  2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 12:11 UTC (permalink / raw)
  To: intel-gfx

If we have a set of active engines marked as being non-persistent, we
lose track of those if the user replaces those engines with
I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
non-persistent requests are terminated if they are no longer being
tracked by the user's context (in order to prevent a lost request
causing an untracked and so unstoppable GPU hang), we need to apply the
same context cancellation upon changing engines.

Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
Testcase: XXX
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 52a749691a8d..5d4093266103 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1623,6 +1623,11 @@ set_engines(struct i915_gem_context *ctx,
 	}
 
 replace:
+	/* Flush stale requests off the old engines if required */
+	if (!i915_gem_context_is_persistent(ctx) ||
+	    !i915_modparams.enable_hangcheck)
+		kill_context(ctx);
+
 	mutex_lock(&ctx->engines_mutex);
 	if (args->size)
 		i915_gem_context_set_user_engines(ctx);
-- 
2.25.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
  2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
@ 2020-02-05 12:13 ` Chris Wilson
  2020-02-05 16:22   ` Tvrtko Ursulin
  2020-02-05 17:17 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2) Patchwork
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 12:13 UTC (permalink / raw)
  To: intel-gfx

If we have a set of active engines marked as being non-persistent, we
lose track of those if the user replaces those engines with
I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
non-persistent requests are terminated if they are no longer being
tracked by the user's context (in order to prevent a lost request
causing an untracked and so unstoppable GPU hang), we need to apply the
same context cancellation upon changing engines.

Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
Testcase: XXX
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 52a749691a8d..20f1d3e0221f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
 
 replace:
 	mutex_lock(&ctx->engines_mutex);
+
+	/* Flush stale requests off the old engines if required */
+	if (!i915_gem_context_is_persistent(ctx) ||
+	    !i915_modparams.enable_hangcheck)
+		kill_context(ctx);
+
 	if (args->size)
 		i915_gem_context_set_user_engines(ctx);
 	else
 		i915_gem_context_clear_user_engines(ctx);
 	set.engines = rcu_replace_pointer(ctx->engines, set.engines, 1);
+
 	mutex_unlock(&ctx->engines_mutex);
 
 	call_rcu(&set.engines->rcu, free_engines_rcu);
-- 
2.25.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
  2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
@ 2020-02-05 16:22   ` Tvrtko Ursulin
  2020-02-05 16:44     ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2020-02-05 16:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 05/02/2020 12:13, Chris Wilson wrote:
> If we have a set of active engines marked as being non-persistent, we
> lose track of those if the user replaces those engines with
> I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
> non-persistent requests are terminated if they are no longer being
> tracked by the user's context (in order to prevent a lost request
> causing an untracked and so unstoppable GPU hang), we need to apply the
> same context cancellation upon changing engines.
> 
> Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
> Testcase: XXX
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
>   1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 52a749691a8d..20f1d3e0221f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
>   
>   replace:
>   	mutex_lock(&ctx->engines_mutex);
> +
> +	/* Flush stale requests off the old engines if required */
> +	if (!i915_gem_context_is_persistent(ctx) ||
> +	    !i915_modparams.enable_hangcheck)
> +		kill_context(ctx);

Is the negative effect of this is legit contexts can't keep submitting 
and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but 
still. Might break legitimate userspace. Not that I offer solutions.. :( 
Banning changing engines once context went non-persistent? That too can 
break someone.

Regards,

Tvrtko

> +
>   	if (args->size)
>   		i915_gem_context_set_user_engines(ctx);
>   	else
>   		i915_gem_context_clear_user_engines(ctx);
>   	set.engines = rcu_replace_pointer(ctx->engines, set.engines, 1);
> +
>   	mutex_unlock(&ctx->engines_mutex);
>   
>   	call_rcu(&set.engines->rcu, free_engines_rcu);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
  2020-02-05 16:22   ` Tvrtko Ursulin
@ 2020-02-05 16:44     ` Chris Wilson
  2020-02-05 18:33       ` Tvrtko Ursulin
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 16:44 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-02-05 16:22:34)
> 
> 
> On 05/02/2020 12:13, Chris Wilson wrote:
> > If we have a set of active engines marked as being non-persistent, we
> > lose track of those if the user replaces those engines with
> > I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
> > non-persistent requests are terminated if they are no longer being
> > tracked by the user's context (in order to prevent a lost request
> > causing an untracked and so unstoppable GPU hang), we need to apply the
> > same context cancellation upon changing engines.
> > 
> > Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
> > Testcase: XXX
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
> >   1 file changed, 7 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 52a749691a8d..20f1d3e0221f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
> >   
> >   replace:
> >       mutex_lock(&ctx->engines_mutex);
> > +
> > +     /* Flush stale requests off the old engines if required */
> > +     if (!i915_gem_context_is_persistent(ctx) ||
> > +         !i915_modparams.enable_hangcheck)
> > +             kill_context(ctx);
> 
> Is the negative effect of this is legit contexts can't keep submitting 
> and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but 
> still. Might break legitimate userspace. Not that I offer solutions.. :( 
> Banning changing engines once context went non-persistent? That too can 
> break someone.

It closes the hole we have. To do otherwise, we need to keep track of
the old engines. Not an impossible task, certainly inconvenient.

struct old_engines {
	struct i915_active active;
	struct list_head link;
	struct i915_gem_context *ctx;
	void *engines;
	int num_engines;
};

With a list+spinlock in the ctx that we can work in kill_context.

The biggest catch there is actually worrying about attaching the active
to already executing request, and making sure the coupling doesn't bug
on a concurrent completion. Hmm, it's just a completion callback, but
more convenient to use a ready made one.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2)
  2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
  2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
@ 2020-02-05 17:17 ` Patchwork
  2020-02-05 19:39 ` [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-02-05 17:17 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2)
URL   : https://patchwork.freedesktop.org/series/73023/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_7869 -> Patchwork_16434
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/index.html

Known issues
------------

  Here are the changes found in Patchwork_16434 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_close_race@basic-threads:
    - fi-byt-j1900:       [PASS][1] -> [INCOMPLETE][2] ([i915#45])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-byt-j1900/igt@gem_close_race@basic-threads.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-byt-j1900/igt@gem_close_race@basic-threads.html

  * igt@i915_selftest@live_execlists:
    - fi-icl-y:           [PASS][3] -> [DMESG-FAIL][4] ([fdo#108569])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-icl-y/igt@i915_selftest@live_execlists.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-icl-y/igt@i915_selftest@live_execlists.html

  * igt@i915_selftest@live_gem_contexts:
    - fi-kbl-x1275:       [PASS][5] -> [INCOMPLETE][6] ([i915#504])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-kbl-x1275/igt@i915_selftest@live_gem_contexts.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-kbl-x1275/igt@i915_selftest@live_gem_contexts.html

  * igt@i915_selftest@live_gtt:
    - fi-bdw-5557u:       [PASS][7] -> [TIMEOUT][8] ([fdo#112271])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-bdw-5557u/igt@i915_selftest@live_gtt.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-bdw-5557u/igt@i915_selftest@live_gtt.html

  * igt@prime_self_import@basic-llseek-bad:
    - fi-tgl-y:           [PASS][9] -> [DMESG-WARN][10] ([CI#94] / [i915#402]) +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-tgl-y/igt@prime_self_import@basic-llseek-bad.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-tgl-y/igt@prime_self_import@basic-llseek-bad.html

  
#### Possible fixes ####

  * igt@gem_exec_parallel@fds:
    - fi-icl-u3:          [INCOMPLETE][11] -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-icl-u3/igt@gem_exec_parallel@fds.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-icl-u3/igt@gem_exec_parallel@fds.html

  * igt@gem_exec_suspend@basic-s4-devices:
    - fi-tgl-y:           [FAIL][13] ([CI#94]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-tgl-y/igt@gem_exec_suspend@basic-s4-devices.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-tgl-y/igt@gem_exec_suspend@basic-s4-devices.html

  * igt@i915_getparams_basic@basic-subslice-total:
    - fi-tgl-y:           [DMESG-WARN][15] ([CI#94] / [i915#402]) -> [PASS][16] +1 similar issue
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-tgl-y/igt@i915_getparams_basic@basic-subslice-total.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-tgl-y/igt@i915_getparams_basic@basic-subslice-total.html

  * igt@i915_module_load@reload:
    - fi-skl-6770hq:      [DMESG-WARN][17] ([i915#92]) -> [PASS][18] +1 similar issue
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-skl-6770hq/igt@i915_module_load@reload.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-skl-6770hq/igt@i915_module_load@reload.html

  * igt@i915_selftest@live_blt:
    - fi-ivb-3770:        [DMESG-FAIL][19] ([i915#725]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-ivb-3770/igt@i915_selftest@live_blt.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-ivb-3770/igt@i915_selftest@live_blt.html

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-kbl-7500u:       [FAIL][21] ([fdo#109635] / [i915#217]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-kbl-7500u/igt@kms_chamelium@hdmi-crc-fast.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-kbl-7500u/igt@kms_chamelium@hdmi-crc-fast.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       [FAIL][23] ([fdo#111096] / [i915#323]) -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence:
    - fi-skl-6770hq:      [SKIP][25] ([fdo#109271]) -> [PASS][26] +4 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-skl-6770hq/igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-skl-6770hq/igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-c:
    - fi-skl-6770hq:      [DMESG-WARN][27] ([i915#106] / [i915#188]) -> [PASS][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-skl-6770hq/igt@kms_pipe_crc_basic@read-crc-pipe-c.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-skl-6770hq/igt@kms_pipe_crc_basic@read-crc-pipe-c.html

  
#### Warnings ####

  * igt@i915_pm_rpm@module-reload:
    - fi-skl-6770hq:      [DMESG-WARN][29] ([i915#92]) -> [FAIL][30] ([i915#178])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-skl-6770hq/igt@i915_pm_rpm@module-reload.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-skl-6770hq/igt@i915_pm_rpm@module-reload.html

  * igt@i915_selftest@live_blt:
    - fi-hsw-4770:        [DMESG-FAIL][31] ([i915#553]) -> [DMESG-FAIL][32] ([i915#553] / [i915#725])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-hsw-4770/igt@i915_selftest@live_blt.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-hsw-4770/igt@i915_selftest@live_blt.html

  
  [CI#94]: https://gitlab.freedesktop.org/gfx-ci/i915-infra/issues/94
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109635]: https://bugs.freedesktop.org/show_bug.cgi?id=109635
  [fdo#111096]: https://bugs.freedesktop.org/show_bug.cgi?id=111096
  [fdo#112271]: https://bugs.freedesktop.org/show_bug.cgi?id=112271
  [i915#106]: https://gitlab.freedesktop.org/drm/intel/issues/106
  [i915#178]: https://gitlab.freedesktop.org/drm/intel/issues/178
  [i915#188]: https://gitlab.freedesktop.org/drm/intel/issues/188
  [i915#217]: https://gitlab.freedesktop.org/drm/intel/issues/217
  [i915#323]: https://gitlab.freedesktop.org/drm/intel/issues/323
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#45]: https://gitlab.freedesktop.org/drm/intel/issues/45
  [i915#504]: https://gitlab.freedesktop.org/drm/intel/issues/504
  [i915#553]: https://gitlab.freedesktop.org/drm/intel/issues/553
  [i915#725]: https://gitlab.freedesktop.org/drm/intel/issues/725
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92


Participating hosts (49 -> 39)
------------------------------

  Additional (2): fi-kbl-7560u fi-gdg-551 
  Missing    (12): fi-kbl-soraka fi-hsw-4200u fi-bdw-gvtdvm fi-bsw-cyan fi-ctg-p8600 fi-elk-e7500 fi-bdw-samus fi-byt-n2820 fi-byt-clapper fi-bsw-nick fi-skl-6600u fi-snb-2600 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_7869 -> Patchwork_16434

  CI-20190529: 20190529
  CI_DRM_7869: db0579be255412f38a450c3c577f8d10f1195034 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5419: 44913a91e77434b03001bb9ea53216cd03c476e6 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_16434: ff4baf44b0269a86adf6319dafac50877260ce2f @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ff4baf44b026 drm/i915/gem: Don't leak non-persistent requests on changing engines

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
  2020-02-05 16:44     ` Chris Wilson
@ 2020-02-05 18:33       ` Tvrtko Ursulin
  2020-02-05 18:39         ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2020-02-05 18:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 05/02/2020 16:44, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-02-05 16:22:34)
>> On 05/02/2020 12:13, Chris Wilson wrote:
>>> If we have a set of active engines marked as being non-persistent, we
>>> lose track of those if the user replaces those engines with
>>> I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
>>> non-persistent requests are terminated if they are no longer being
>>> tracked by the user's context (in order to prevent a lost request
>>> causing an untracked and so unstoppable GPU hang), we need to apply the
>>> same context cancellation upon changing engines.
>>>
>>> Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
>>> Testcase: XXX
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
>>>    1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> index 52a749691a8d..20f1d3e0221f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
>>>    
>>>    replace:
>>>        mutex_lock(&ctx->engines_mutex);
>>> +
>>> +     /* Flush stale requests off the old engines if required */
>>> +     if (!i915_gem_context_is_persistent(ctx) ||
>>> +         !i915_modparams.enable_hangcheck)
>>> +             kill_context(ctx);
>>
>> Is the negative effect of this is legit contexts can't keep submitting
>> and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but
>> still. Might break legitimate userspace. Not that I offer solutions.. :(
>> Banning changing engines once context went non-persistent? That too can
>> break someone.
> 
> It closes the hole we have. To do otherwise, we need to keep track of
> the old engines. Not an impossible task, certainly inconvenient.
> 
> struct old_engines {
> 	struct i915_active active;
> 	struct list_head link;
> 	struct i915_gem_context *ctx;
> 	void *engines;
> 	int num_engines;
> };
> 
> With a list+spinlock in the ctx that we can work in kill_context.
> 
> The biggest catch there is actually worrying about attaching the active
> to already executing request, and making sure the coupling doesn't bug
> on a concurrent completion. Hmm, it's just a completion callback, but
> more convenient to use a ready made one.

What would you do with old engines? We don't have a mechanism to mark 
intel_context closed. Hm, right, it would get unreachable by definition. 
But how to terminate it if it doesn't play nicely?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
  2020-02-05 18:33       ` Tvrtko Ursulin
@ 2020-02-05 18:39         ` Chris Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 18:39 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-02-05 18:33:57)
> 
> On 05/02/2020 16:44, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-02-05 16:22:34)
> >> On 05/02/2020 12:13, Chris Wilson wrote:
> >>> If we have a set of active engines marked as being non-persistent, we
> >>> lose track of those if the user replaces those engines with
> >>> I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
> >>> non-persistent requests are terminated if they are no longer being
> >>> tracked by the user's context (in order to prevent a lost request
> >>> causing an untracked and so unstoppable GPU hang), we need to apply the
> >>> same context cancellation upon changing engines.
> >>>
> >>> Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
> >>> Testcase: XXX
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
> >>>    1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> index 52a749691a8d..20f1d3e0221f 100644
> >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
> >>>    
> >>>    replace:
> >>>        mutex_lock(&ctx->engines_mutex);
> >>> +
> >>> +     /* Flush stale requests off the old engines if required */
> >>> +     if (!i915_gem_context_is_persistent(ctx) ||
> >>> +         !i915_modparams.enable_hangcheck)
> >>> +             kill_context(ctx);
> >>
> >> Is the negative effect of this is legit contexts can't keep submitting
> >> and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but
> >> still. Might break legitimate userspace. Not that I offer solutions.. :(
> >> Banning changing engines once context went non-persistent? That too can
> >> break someone.
> > 
> > It closes the hole we have. To do otherwise, we need to keep track of
> > the old engines. Not an impossible task, certainly inconvenient.
> > 
> > struct old_engines {
> >       struct i915_active active;
> >       struct list_head link;
> >       struct i915_gem_context *ctx;
> >       void *engines;
> >       int num_engines;
> > };
> > 
> > With a list+spinlock in the ctx that we can work in kill_context.
> > 
> > The biggest catch there is actually worrying about attaching the active
> > to already executing request, and making sure the coupling doesn't bug
> > on a concurrent completion. Hmm, it's just a completion callback, but
> > more convenient to use a ready made one.
> 
> What would you do with old engines? We don't have a mechanism to mark 
> intel_context closed. Hm, right, it would get unreachable by definition. 
> But how to terminate it if it doesn't play nicely?

Wait 30 minutes (if it passes the tests) and you'll find out :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines
  2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
  2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
  2020-02-05 17:17 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2) Patchwork
@ 2020-02-05 19:39 ` Chris Wilson
  2020-02-05 20:32   ` Chris Wilson
  2020-02-05 22:05 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4) Patchwork
  2020-02-05 22:37 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  4 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 19:39 UTC (permalink / raw)
  To: intel-gfx

If we have a set of active engines marked as being non-persistent, we
lose track of those if the user replaces those engines with
I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
non-persistent requests are terminated if they are no longer being
tracked by the user's context (in order to prevent a lost request
causing an untracked and so unstoppable GPU hang), we need to apply the
same context cancellation upon changing engines.

v2: Track stale engines[] so we only reap at context closure.

Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
Testcase: igt/gem_ctx_peristence/replace
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 109 ++++++++++++++++--
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  11 +-
 drivers/gpu/drm/i915/i915_sw_fence.c          |  15 ++-
 drivers/gpu/drm/i915/i915_sw_fence.h          |   2 +-
 4 files changed, 124 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 52a749691a8d..01ffb1d1c6a0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -270,7 +270,8 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx)
 	if (!e)
 		return ERR_PTR(-ENOMEM);
 
-	init_rcu_head(&e->rcu);
+	e->ctx = ctx;
+
 	for_each_engine(engine, gt, id) {
 		struct intel_context *ce;
 
@@ -450,7 +451,7 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce)
 	return engine;
 }
 
-static void kill_context(struct i915_gem_context *ctx)
+static void kill_engines(struct i915_gem_engines *engines)
 {
 	struct i915_gem_engines_iter it;
 	struct intel_context *ce;
@@ -462,7 +463,7 @@ static void kill_context(struct i915_gem_context *ctx)
 	 * However, we only care about pending requests, so only include
 	 * engines on which there are incomplete requests.
 	 */
-	for_each_gem_engine(ce, __context_engines_static(ctx), it) {
+	for_each_gem_engine(ce, engines, it) {
 		struct intel_engine_cs *engine;
 
 		if (intel_context_set_banned(ce))
@@ -484,8 +485,32 @@ static void kill_context(struct i915_gem_context *ctx)
 			 * the context from the GPU, we have to resort to a full
 			 * reset. We hope the collateral damage is worth it.
 			 */
-			__reset_context(ctx, engine);
+			__reset_context(engines->ctx, engine);
+	}
+}
+
+static void kill_context(struct i915_gem_context *ctx)
+{
+	struct i915_gem_engines *pos, *next;
+
+	spin_lock(&ctx->stale_lock);
+	list_for_each_entry_safe(pos, next, &ctx->stale_list, link) {
+		if (!i915_sw_fence_await(&pos->fence))
+			continue;
+
+		spin_unlock(&ctx->stale_lock);
+
+		kill_engines(pos);
+
+		spin_lock(&ctx->stale_lock);
+		list_safe_reset_next(pos, next, link);
+		list_del_init(&pos->link);
+
+		i915_sw_fence_complete(&pos->fence);
 	}
+	spin_unlock(&ctx->stale_lock);
+
+	kill_engines(__context_engines_static(ctx));
 }
 
 static void set_closed_name(struct i915_gem_context *ctx)
@@ -602,6 +627,9 @@ __create_context(struct drm_i915_private *i915)
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
 	mutex_init(&ctx->mutex);
 
+	INIT_LIST_HEAD(&ctx->stale_list);
+	spin_lock_init(&ctx->stale_lock);
+
 	mutex_init(&ctx->engines_mutex);
 	e = default_engines(ctx);
 	if (IS_ERR(e)) {
@@ -1529,6 +1557,66 @@ static const i915_user_extension_fn set_engines__extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
 };
 
+static int engines_notify(struct i915_sw_fence *fence,
+			  enum i915_sw_fence_notify state)
+{
+	struct i915_gem_engines *engines =
+		container_of(fence, typeof(*engines), fence);
+
+	switch (state) {
+	case FENCE_COMPLETE:
+		if (!list_empty(&engines->link)) {
+			spin_lock(&engines->ctx->stale_lock);
+			list_del(&engines->link);
+			spin_unlock(&engines->ctx->stale_lock);
+		}
+		break;
+
+	case FENCE_FREE:
+		init_rcu_head(&engines->rcu);
+		call_rcu(&engines->rcu, free_engines_rcu);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static int engines_listen(struct i915_gem_engines *engines)
+{
+	struct i915_gem_engines_iter it;
+	struct intel_context *ce;
+
+	GEM_BUG_ON(!engines);
+	i915_sw_fence_init(&engines->fence, engines_notify);
+
+	for_each_gem_engine(ce, engines, it) {
+		struct dma_fence *fence;
+		int err;
+
+		if (!ce->timeline)
+			continue;
+
+		fence = i915_active_fence_get(&ce->timeline->last_request);
+		if (!fence)
+			continue;
+
+		err = i915_sw_fence_await_dma_fence(&engines->fence,
+						    fence, 0,
+						    GFP_KERNEL);
+
+		dma_fence_put(fence);
+		if (err < 0)
+			return err;
+	}
+
+	spin_lock(&engines->ctx->stale_lock);
+	list_add(&engines->link, &engines->ctx->stale_list);
+	spin_unlock(&engines->ctx->stale_lock);
+
+	i915_sw_fence_commit(&engines->fence);
+	return 0;
+}
+
 static int
 set_engines(struct i915_gem_context *ctx,
 	    const struct drm_i915_gem_context_param *args)
@@ -1571,7 +1659,8 @@ set_engines(struct i915_gem_context *ctx,
 	if (!set.engines)
 		return -ENOMEM;
 
-	init_rcu_head(&set.engines->rcu);
+	set.engines->ctx = ctx;
+
 	for (n = 0; n < num_engines; n++) {
 		struct i915_engine_class_instance ci;
 		struct intel_engine_cs *engine;
@@ -1631,7 +1720,11 @@ set_engines(struct i915_gem_context *ctx,
 	set.engines = rcu_replace_pointer(ctx->engines, set.engines, 1);
 	mutex_unlock(&ctx->engines_mutex);
 
-	call_rcu(&set.engines->rcu, free_engines_rcu);
+	/* Keep track of old engine sets for kill_context() */
+	if (engines_listen(set.engines)) {
+		kill_engines(set.engines); /* on error, immediately cleanup */
+		engines_notify(&set.engines->fence, FENCE_FREE);
+	}
 
 	return 0;
 }
@@ -1646,7 +1739,6 @@ __copy_engines(struct i915_gem_engines *e)
 	if (!copy)
 		return ERR_PTR(-ENOMEM);
 
-	init_rcu_head(&copy->rcu);
 	for (n = 0; n < e->num_engines; n++) {
 		if (e->engines[n])
 			copy->engines[n] = intel_context_get(e->engines[n]);
@@ -1890,7 +1982,8 @@ static int clone_engines(struct i915_gem_context *dst,
 	if (!clone)
 		goto err_unlock;
 
-	init_rcu_head(&clone->rcu);
+	clone->ctx = dst;
+
 	for (n = 0; n < e->num_engines; n++) {
 		struct intel_engine_cs *engine;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 017ca803ab47..39c2f6189684 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -20,6 +20,7 @@
 #include "gt/intel_context_types.h"
 
 #include "i915_scheduler.h"
+#include "i915_sw_fence.h"
 
 struct pid;
 
@@ -30,7 +31,12 @@ struct intel_timeline;
 struct intel_ring;
 
 struct i915_gem_engines {
-	struct rcu_head rcu;
+	union {
+		struct rcu_head rcu;
+		struct list_head link;
+	};
+	struct i915_sw_fence fence;
+	struct i915_gem_context *ctx;
 	unsigned int num_engines;
 	struct intel_context *engines[];
 };
@@ -173,6 +179,9 @@ struct i915_gem_context {
 	 * context in messages.
 	 */
 	char name[TASK_COMM_LEN + 8];
+
+	struct spinlock stale_lock;
+	struct list_head stale_list;
 };
 
 #endif /* __I915_GEM_CONTEXT_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 51ba97daf2a0..e47b9cf952c4 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -211,10 +211,19 @@ void i915_sw_fence_complete(struct i915_sw_fence *fence)
 	__i915_sw_fence_complete(fence, NULL);
 }
 
-void i915_sw_fence_await(struct i915_sw_fence *fence)
+bool i915_sw_fence_await(struct i915_sw_fence *fence)
 {
-	debug_fence_assert(fence);
-	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
+	int old, new;
+
+	new = atomic_read(&fence->pending);
+	do {
+		if (new < 1)
+			return false;
+
+		old = new++;
+	} while((new = atomic_cmpxchg(&fence->pending, old, new)) != old);
+
+	return true;
 }
 
 void __i915_sw_fence_init(struct i915_sw_fence *fence,
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 19e806ce43bc..30a863353ee6 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -91,7 +91,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    unsigned long timeout,
 				    gfp_t gfp);
 
-void i915_sw_fence_await(struct i915_sw_fence *fence);
+bool i915_sw_fence_await(struct i915_sw_fence *fence);
 void i915_sw_fence_complete(struct i915_sw_fence *fence);
 
 static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
-- 
2.25.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines
  2020-02-05 19:39 ` [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
@ 2020-02-05 20:32   ` Chris Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 20:32 UTC (permalink / raw)
  To: intel-gfx

If we have a set of active engines marked as being non-persistent, we
lose track of those if the user replaces those engines with
I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
non-persistent requests are terminated if they are no longer being
tracked by the user's context (in order to prevent a lost request
causing an untracked and so unstoppable GPU hang), we need to apply the
same context cancellation upon changing engines.

v2: Track stale engines[] so we only reap at context closure.

Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
Testcase: igt/gem_ctx_peristence/replace
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 118 ++++++++++++++++--
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  11 +-
 drivers/gpu/drm/i915/i915_sw_fence.c          |  15 ++-
 drivers/gpu/drm/i915/i915_sw_fence.h          |   2 +-
 4 files changed, 133 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 52a749691a8d..21ce84cc1f68 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -270,7 +270,8 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx)
 	if (!e)
 		return ERR_PTR(-ENOMEM);
 
-	init_rcu_head(&e->rcu);
+	e->ctx = ctx;
+
 	for_each_engine(engine, gt, id) {
 		struct intel_context *ce;
 
@@ -450,7 +451,7 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce)
 	return engine;
 }
 
-static void kill_context(struct i915_gem_context *ctx)
+static void kill_engines(struct i915_gem_engines *engines)
 {
 	struct i915_gem_engines_iter it;
 	struct intel_context *ce;
@@ -462,7 +463,7 @@ static void kill_context(struct i915_gem_context *ctx)
 	 * However, we only care about pending requests, so only include
 	 * engines on which there are incomplete requests.
 	 */
-	for_each_gem_engine(ce, __context_engines_static(ctx), it) {
+	for_each_gem_engine(ce, engines, it) {
 		struct intel_engine_cs *engine;
 
 		if (intel_context_set_banned(ce))
@@ -484,10 +485,41 @@ static void kill_context(struct i915_gem_context *ctx)
 			 * the context from the GPU, we have to resort to a full
 			 * reset. We hope the collateral damage is worth it.
 			 */
-			__reset_context(ctx, engine);
+			__reset_context(engines->ctx, engine);
+	}
+}
+
+static void kill_stale_engines(struct i915_gem_context *ctx)
+{
+	if (!list_empty(&ctx->stale_list)) {
+		struct i915_gem_engines *pos, *next;
+		unsigned long flags;
+
+		spin_lock_irqsave(&ctx->stale_lock, flags);
+		list_for_each_entry_safe(pos, next, &ctx->stale_list, link) {
+			if (!i915_sw_fence_await(&pos->fence))
+				continue;
+
+			spin_unlock_irqrestore(&ctx->stale_lock, flags);
+
+			kill_engines(pos);
+
+			spin_lock_irqsave(&ctx->stale_lock, flags);
+			list_safe_reset_next(pos, next, link);
+			list_del_init(&pos->link);
+
+			i915_sw_fence_complete(&pos->fence);
+		}
+		spin_unlock_irqrestore(&ctx->stale_lock, flags);
 	}
 }
 
+static void kill_context(struct i915_gem_context *ctx)
+{
+	kill_stale_engines(ctx);
+	kill_engines(__context_engines_static(ctx));
+}
+
 static void set_closed_name(struct i915_gem_context *ctx)
 {
 	char *s;
@@ -602,6 +634,9 @@ __create_context(struct drm_i915_private *i915)
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
 	mutex_init(&ctx->mutex);
 
+	INIT_LIST_HEAD(&ctx->stale_list);
+	spin_lock_init(&ctx->stale_lock);
+
 	mutex_init(&ctx->engines_mutex);
 	e = default_engines(ctx);
 	if (IS_ERR(e)) {
@@ -1529,6 +1564,71 @@ static const i915_user_extension_fn set_engines__extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
 };
 
+static int engines_notify(struct i915_sw_fence *fence,
+			  enum i915_sw_fence_notify state)
+{
+	struct i915_gem_engines *engines =
+		container_of(fence, typeof(*engines), fence);
+
+	switch (state) {
+	case FENCE_COMPLETE:
+		if (!list_empty(&engines->link)) {
+			struct i915_gem_context *ctx = engines->ctx;
+			unsigned long flags;
+
+			spin_lock_irqsave(&ctx->stale_lock, flags);
+			list_del(&engines->link);
+			spin_unlock_irqrestore(&ctx->stale_lock, flags);
+		}
+		break;
+
+	case FENCE_FREE:
+		init_rcu_head(&engines->rcu);
+		call_rcu(&engines->rcu, free_engines_rcu);
+		break;
+	}
+
+	return NOTIFY_DONE;
+}
+
+static void engines_idle_release(struct i915_gem_engines *engines)
+{
+	struct i915_gem_engines_iter it;
+	struct intel_context *ce;
+	unsigned long flags;
+
+	GEM_BUG_ON(!engines);
+	i915_sw_fence_init(&engines->fence, engines_notify);
+
+	spin_lock_irqsave(&engines->ctx->stale_lock, flags);
+	list_add(&engines->link, &engines->ctx->stale_list);
+	spin_unlock_irqrestore(&engines->ctx->stale_lock, flags);
+
+	for_each_gem_engine(ce, engines, it) {
+		struct dma_fence *fence;
+		int err;
+
+		if (!ce->timeline)
+			continue;
+
+		fence = i915_active_fence_get(&ce->timeline->last_request);
+		if (!fence)
+			continue;
+
+		err = i915_sw_fence_await_dma_fence(&engines->fence,
+						    fence, 0,
+						    GFP_KERNEL);
+
+		dma_fence_put(fence);
+		if (err < 0) {
+			kill_engines(engines);
+			break;
+		}
+	}
+
+	i915_sw_fence_commit(&engines->fence);
+}
+
 static int
 set_engines(struct i915_gem_context *ctx,
 	    const struct drm_i915_gem_context_param *args)
@@ -1571,7 +1671,8 @@ set_engines(struct i915_gem_context *ctx,
 	if (!set.engines)
 		return -ENOMEM;
 
-	init_rcu_head(&set.engines->rcu);
+	set.engines->ctx = ctx;
+
 	for (n = 0; n < num_engines; n++) {
 		struct i915_engine_class_instance ci;
 		struct intel_engine_cs *engine;
@@ -1631,7 +1732,8 @@ set_engines(struct i915_gem_context *ctx,
 	set.engines = rcu_replace_pointer(ctx->engines, set.engines, 1);
 	mutex_unlock(&ctx->engines_mutex);
 
-	call_rcu(&set.engines->rcu, free_engines_rcu);
+	/* Keep track of old engine sets for kill_context() */
+	engines_idle_release(set.engines);
 
 	return 0;
 }
@@ -1646,7 +1748,6 @@ __copy_engines(struct i915_gem_engines *e)
 	if (!copy)
 		return ERR_PTR(-ENOMEM);
 
-	init_rcu_head(&copy->rcu);
 	for (n = 0; n < e->num_engines; n++) {
 		if (e->engines[n])
 			copy->engines[n] = intel_context_get(e->engines[n]);
@@ -1890,7 +1991,8 @@ static int clone_engines(struct i915_gem_context *dst,
 	if (!clone)
 		goto err_unlock;
 
-	init_rcu_head(&clone->rcu);
+	clone->ctx = dst;
+
 	for (n = 0; n < e->num_engines; n++) {
 		struct intel_engine_cs *engine;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 017ca803ab47..39c2f6189684 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -20,6 +20,7 @@
 #include "gt/intel_context_types.h"
 
 #include "i915_scheduler.h"
+#include "i915_sw_fence.h"
 
 struct pid;
 
@@ -30,7 +31,12 @@ struct intel_timeline;
 struct intel_ring;
 
 struct i915_gem_engines {
-	struct rcu_head rcu;
+	union {
+		struct rcu_head rcu;
+		struct list_head link;
+	};
+	struct i915_sw_fence fence;
+	struct i915_gem_context *ctx;
 	unsigned int num_engines;
 	struct intel_context *engines[];
 };
@@ -173,6 +179,9 @@ struct i915_gem_context {
 	 * context in messages.
 	 */
 	char name[TASK_COMM_LEN + 8];
+
+	struct spinlock stale_lock;
+	struct list_head stale_list;
 };
 
 #endif /* __I915_GEM_CONTEXT_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 51ba97daf2a0..9a20b7246f91 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -211,10 +211,19 @@ void i915_sw_fence_complete(struct i915_sw_fence *fence)
 	__i915_sw_fence_complete(fence, NULL);
 }
 
-void i915_sw_fence_await(struct i915_sw_fence *fence)
+bool i915_sw_fence_await(struct i915_sw_fence *fence)
 {
-	debug_fence_assert(fence);
-	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
+	int old, new;
+
+	new = atomic_read(&fence->pending);
+	do {
+		if (new < 1)
+			return false;
+
+		old = new++;
+	} while ((new = atomic_cmpxchg(&fence->pending, old, new)) != old);
+
+	return true;
 }
 
 void __i915_sw_fence_init(struct i915_sw_fence *fence,
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 19e806ce43bc..30a863353ee6 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -91,7 +91,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    unsigned long timeout,
 				    gfp_t gfp);
 
-void i915_sw_fence_await(struct i915_sw_fence *fence);
+bool i915_sw_fence_await(struct i915_sw_fence *fence);
 void i915_sw_fence_complete(struct i915_sw_fence *fence);
 
 static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
-- 
2.25.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4)
  2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
                   ` (2 preceding siblings ...)
  2020-02-05 19:39 ` [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
@ 2020-02-05 22:05 ` Patchwork
  2020-02-05 22:37 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-02-05 22:05 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4)
URL   : https://patchwork.freedesktop.org/series/73023/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
66727965053c drm/i915/gem: Don't leak non-persistent requests on changing engines
-:248: WARNING:USE_SPINLOCK_T: struct spinlock should be spinlock_t
#248: FILE: drivers/gpu/drm/i915/gem/i915_gem_context_types.h:183:
+	struct spinlock stale_lock;

total: 0 errors, 1 warnings, 0 checks, 240 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4)
  2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
                   ` (3 preceding siblings ...)
  2020-02-05 22:05 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4) Patchwork
@ 2020-02-05 22:37 ` Patchwork
  4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-02-05 22:37 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4)
URL   : https://patchwork.freedesktop.org/series/73023/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_7871 -> Patchwork_16442
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_16442 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_16442, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_16442:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live_gem_contexts:
    - fi-kbl-x1275:       [PASS][1] -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-kbl-x1275/igt@i915_selftest@live_gem_contexts.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-kbl-x1275/igt@i915_selftest@live_gem_contexts.html

  
Known issues
------------

  Here are the changes found in Patchwork_16442 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live_blt:
    - fi-hsw-4770r:       [PASS][3] -> [DMESG-FAIL][4] ([i915#553] / [i915#725])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-hsw-4770r/igt@i915_selftest@live_blt.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-hsw-4770r/igt@i915_selftest@live_blt.html
    - fi-hsw-4770:        [PASS][5] -> [DMESG-FAIL][6] ([i915#725])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-hsw-4770/igt@i915_selftest@live_blt.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-hsw-4770/igt@i915_selftest@live_blt.html

  * igt@i915_selftest@live_execlists:
    - fi-icl-y:           [PASS][7] -> [DMESG-FAIL][8] ([fdo#108569])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-icl-y/igt@i915_selftest@live_execlists.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-icl-y/igt@i915_selftest@live_execlists.html

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-icl-u2:          [PASS][9] -> [FAIL][10] ([i915#217])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html

  * igt@prime_self_import@basic-llseek-size:
    - fi-tgl-y:           [PASS][11] -> [DMESG-WARN][12] ([CI#94] / [i915#402]) +1 similar issue
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-tgl-y/igt@prime_self_import@basic-llseek-size.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-tgl-y/igt@prime_self_import@basic-llseek-size.html

  
#### Possible fixes ####

  * igt@i915_selftest@live_gtt:
    - fi-icl-guc:         [TIMEOUT][13] ([fdo#112271]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-icl-guc/igt@i915_selftest@live_gtt.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-icl-guc/igt@i915_selftest@live_gtt.html

  * igt@kms_addfb_basic@bad-pitch-128:
    - fi-tgl-y:           [DMESG-WARN][15] ([CI#94] / [i915#402]) -> [PASS][16] +1 similar issue
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-tgl-y/igt@kms_addfb_basic@bad-pitch-128.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-tgl-y/igt@kms_addfb_basic@bad-pitch-128.html

  
  [CI#94]: https://gitlab.freedesktop.org/gfx-ci/i915-infra/issues/94
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#112271]: https://bugs.freedesktop.org/show_bug.cgi?id=112271
  [i915#217]: https://gitlab.freedesktop.org/drm/intel/issues/217
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#553]: https://gitlab.freedesktop.org/drm/intel/issues/553
  [i915#725]: https://gitlab.freedesktop.org/drm/intel/issues/725


Participating hosts (45 -> 42)
------------------------------

  Additional (5): fi-bdw-5557u fi-hsw-peppy fi-bwr-2160 fi-ilk-650 fi-kbl-7500u 
  Missing    (8): fi-bsw-n3050 fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-ivb-3770 fi-byt-clapper fi-bdw-samus fi-kbl-r 


Build changes
-------------

  * CI: CI-20190529 -> None
  * Linux: CI_DRM_7871 -> Patchwork_16442

  CI-20190529: 20190529
  CI_DRM_7871: c9b0237ee7ffb1bbb62f864f0b2d7b290ee1313d @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5419: 44913a91e77434b03001bb9ea53216cd03c476e6 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_16442: 66727965053c0e896d166743999590651bf4ca80 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

66727965053c drm/i915/gem: Don't leak non-persistent requests on changing engines

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-02-05 22:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
2020-02-05 16:22   ` Tvrtko Ursulin
2020-02-05 16:44     ` Chris Wilson
2020-02-05 18:33       ` Tvrtko Ursulin
2020-02-05 18:39         ` Chris Wilson
2020-02-05 17:17 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2) Patchwork
2020-02-05 19:39 ` [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
2020-02-05 20:32   ` Chris Wilson
2020-02-05 22:05 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4) Patchwork
2020-02-05 22:37 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.