* [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines
@ 2020-02-05 12:11 Chris Wilson
2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
` (4 more replies)
0 siblings, 5 replies; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 12:11 UTC (permalink / raw)
To: intel-gfx
If we have a set of active engines marked as being non-persistent, we
lose track of those if the user replaces those engines with
I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
non-persistent requests are terminated if they are no longer being
tracked by the user's context (in order to prevent a lost request
causing an untracked and so unstoppable GPU hang), we need to apply the
same context cancellation upon changing engines.
Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
Testcase: XXX
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 52a749691a8d..5d4093266103 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1623,6 +1623,11 @@ set_engines(struct i915_gem_context *ctx,
}
replace:
+ /* Flush stale requests off the old engines if required */
+ if (!i915_gem_context_is_persistent(ctx) ||
+ !i915_modparams.enable_hangcheck)
+ kill_context(ctx);
+
mutex_lock(&ctx->engines_mutex);
if (args->size)
i915_gem_context_set_user_engines(ctx);
--
2.25.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
@ 2020-02-05 12:13 ` Chris Wilson
2020-02-05 16:22 ` Tvrtko Ursulin
2020-02-05 17:17 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2) Patchwork
` (3 subsequent siblings)
4 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 12:13 UTC (permalink / raw)
To: intel-gfx
If we have a set of active engines marked as being non-persistent, we
lose track of those if the user replaces those engines with
I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
non-persistent requests are terminated if they are no longer being
tracked by the user's context (in order to prevent a lost request
causing an untracked and so unstoppable GPU hang), we need to apply the
same context cancellation upon changing engines.
Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
Testcase: XXX
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 52a749691a8d..20f1d3e0221f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
replace:
mutex_lock(&ctx->engines_mutex);
+
+ /* Flush stale requests off the old engines if required */
+ if (!i915_gem_context_is_persistent(ctx) ||
+ !i915_modparams.enable_hangcheck)
+ kill_context(ctx);
+
if (args->size)
i915_gem_context_set_user_engines(ctx);
else
i915_gem_context_clear_user_engines(ctx);
set.engines = rcu_replace_pointer(ctx->engines, set.engines, 1);
+
mutex_unlock(&ctx->engines_mutex);
call_rcu(&set.engines->rcu, free_engines_rcu);
--
2.25.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
@ 2020-02-05 16:22 ` Tvrtko Ursulin
2020-02-05 16:44 ` Chris Wilson
0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2020-02-05 16:22 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 05/02/2020 12:13, Chris Wilson wrote:
> If we have a set of active engines marked as being non-persistent, we
> lose track of those if the user replaces those engines with
> I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
> non-persistent requests are terminated if they are no longer being
> tracked by the user's context (in order to prevent a lost request
> causing an untracked and so unstoppable GPU hang), we need to apply the
> same context cancellation upon changing engines.
>
> Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
> Testcase: XXX
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
> drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 52a749691a8d..20f1d3e0221f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
>
> replace:
> mutex_lock(&ctx->engines_mutex);
> +
> + /* Flush stale requests off the old engines if required */
> + if (!i915_gem_context_is_persistent(ctx) ||
> + !i915_modparams.enable_hangcheck)
> + kill_context(ctx);
Is the negative effect of this is legit contexts can't keep submitting
and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but
still. Might break legitimate userspace. Not that I offer solutions.. :(
Banning changing engines once context went non-persistent? That too can
break someone.
Regards,
Tvrtko
> +
> if (args->size)
> i915_gem_context_set_user_engines(ctx);
> else
> i915_gem_context_clear_user_engines(ctx);
> set.engines = rcu_replace_pointer(ctx->engines, set.engines, 1);
> +
> mutex_unlock(&ctx->engines_mutex);
>
> call_rcu(&set.engines->rcu, free_engines_rcu);
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
2020-02-05 16:22 ` Tvrtko Ursulin
@ 2020-02-05 16:44 ` Chris Wilson
2020-02-05 18:33 ` Tvrtko Ursulin
0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 16:44 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx
Quoting Tvrtko Ursulin (2020-02-05 16:22:34)
>
>
> On 05/02/2020 12:13, Chris Wilson wrote:
> > If we have a set of active engines marked as being non-persistent, we
> > lose track of those if the user replaces those engines with
> > I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
> > non-persistent requests are terminated if they are no longer being
> > tracked by the user's context (in order to prevent a lost request
> > causing an untracked and so unstoppable GPU hang), we need to apply the
> > same context cancellation upon changing engines.
> >
> > Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
> > Testcase: XXX
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> > drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index 52a749691a8d..20f1d3e0221f 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
> >
> > replace:
> > mutex_lock(&ctx->engines_mutex);
> > +
> > + /* Flush stale requests off the old engines if required */
> > + if (!i915_gem_context_is_persistent(ctx) ||
> > + !i915_modparams.enable_hangcheck)
> > + kill_context(ctx);
>
> Is the negative effect of this is legit contexts can't keep submitting
> and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but
> still. Might break legitimate userspace. Not that I offer solutions.. :(
> Banning changing engines once context went non-persistent? That too can
> break someone.
It closes the hole we have. To do otherwise, we need to keep track of
the old engines. Not an impossible task, certainly inconvenient.
struct old_engines {
struct i915_active active;
struct list_head link;
struct i915_gem_context *ctx;
void *engines;
int num_engines;
};
With a list+spinlock in the ctx that we can work in kill_context.
The biggest catch there is actually worrying about attaching the active
to already executing request, and making sure the coupling doesn't bug
on a concurrent completion. Hmm, it's just a completion callback, but
more convenient to use a ready made one.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2)
2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
@ 2020-02-05 17:17 ` Patchwork
2020-02-05 19:39 ` [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
` (2 subsequent siblings)
4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-02-05 17:17 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2)
URL : https://patchwork.freedesktop.org/series/73023/
State : success
== Summary ==
CI Bug Log - changes from CI_DRM_7869 -> Patchwork_16434
====================================================
Summary
-------
**SUCCESS**
No regressions found.
External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/index.html
Known issues
------------
Here are the changes found in Patchwork_16434 that come from known issues:
### IGT changes ###
#### Issues hit ####
* igt@gem_close_race@basic-threads:
- fi-byt-j1900: [PASS][1] -> [INCOMPLETE][2] ([i915#45])
[1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-byt-j1900/igt@gem_close_race@basic-threads.html
[2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-byt-j1900/igt@gem_close_race@basic-threads.html
* igt@i915_selftest@live_execlists:
- fi-icl-y: [PASS][3] -> [DMESG-FAIL][4] ([fdo#108569])
[3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-icl-y/igt@i915_selftest@live_execlists.html
[4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-icl-y/igt@i915_selftest@live_execlists.html
* igt@i915_selftest@live_gem_contexts:
- fi-kbl-x1275: [PASS][5] -> [INCOMPLETE][6] ([i915#504])
[5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-kbl-x1275/igt@i915_selftest@live_gem_contexts.html
[6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-kbl-x1275/igt@i915_selftest@live_gem_contexts.html
* igt@i915_selftest@live_gtt:
- fi-bdw-5557u: [PASS][7] -> [TIMEOUT][8] ([fdo#112271])
[7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-bdw-5557u/igt@i915_selftest@live_gtt.html
[8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-bdw-5557u/igt@i915_selftest@live_gtt.html
* igt@prime_self_import@basic-llseek-bad:
- fi-tgl-y: [PASS][9] -> [DMESG-WARN][10] ([CI#94] / [i915#402]) +1 similar issue
[9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-tgl-y/igt@prime_self_import@basic-llseek-bad.html
[10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-tgl-y/igt@prime_self_import@basic-llseek-bad.html
#### Possible fixes ####
* igt@gem_exec_parallel@fds:
- fi-icl-u3: [INCOMPLETE][11] -> [PASS][12]
[11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-icl-u3/igt@gem_exec_parallel@fds.html
[12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-icl-u3/igt@gem_exec_parallel@fds.html
* igt@gem_exec_suspend@basic-s4-devices:
- fi-tgl-y: [FAIL][13] ([CI#94]) -> [PASS][14]
[13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-tgl-y/igt@gem_exec_suspend@basic-s4-devices.html
[14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-tgl-y/igt@gem_exec_suspend@basic-s4-devices.html
* igt@i915_getparams_basic@basic-subslice-total:
- fi-tgl-y: [DMESG-WARN][15] ([CI#94] / [i915#402]) -> [PASS][16] +1 similar issue
[15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-tgl-y/igt@i915_getparams_basic@basic-subslice-total.html
[16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-tgl-y/igt@i915_getparams_basic@basic-subslice-total.html
* igt@i915_module_load@reload:
- fi-skl-6770hq: [DMESG-WARN][17] ([i915#92]) -> [PASS][18] +1 similar issue
[17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-skl-6770hq/igt@i915_module_load@reload.html
[18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-skl-6770hq/igt@i915_module_load@reload.html
* igt@i915_selftest@live_blt:
- fi-ivb-3770: [DMESG-FAIL][19] ([i915#725]) -> [PASS][20]
[19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-ivb-3770/igt@i915_selftest@live_blt.html
[20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-ivb-3770/igt@i915_selftest@live_blt.html
* igt@kms_chamelium@hdmi-crc-fast:
- fi-kbl-7500u: [FAIL][21] ([fdo#109635] / [i915#217]) -> [PASS][22]
[21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-kbl-7500u/igt@kms_chamelium@hdmi-crc-fast.html
[22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-kbl-7500u/igt@kms_chamelium@hdmi-crc-fast.html
* igt@kms_chamelium@hdmi-hpd-fast:
- fi-kbl-7500u: [FAIL][23] ([fdo#111096] / [i915#323]) -> [PASS][24]
[23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
[24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-kbl-7500u/igt@kms_chamelium@hdmi-hpd-fast.html
* igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence:
- fi-skl-6770hq: [SKIP][25] ([fdo#109271]) -> [PASS][26] +4 similar issues
[25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-skl-6770hq/igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence.html
[26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-skl-6770hq/igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence.html
* igt@kms_pipe_crc_basic@read-crc-pipe-c:
- fi-skl-6770hq: [DMESG-WARN][27] ([i915#106] / [i915#188]) -> [PASS][28]
[27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-skl-6770hq/igt@kms_pipe_crc_basic@read-crc-pipe-c.html
[28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-skl-6770hq/igt@kms_pipe_crc_basic@read-crc-pipe-c.html
#### Warnings ####
* igt@i915_pm_rpm@module-reload:
- fi-skl-6770hq: [DMESG-WARN][29] ([i915#92]) -> [FAIL][30] ([i915#178])
[29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-skl-6770hq/igt@i915_pm_rpm@module-reload.html
[30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-skl-6770hq/igt@i915_pm_rpm@module-reload.html
* igt@i915_selftest@live_blt:
- fi-hsw-4770: [DMESG-FAIL][31] ([i915#553]) -> [DMESG-FAIL][32] ([i915#553] / [i915#725])
[31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7869/fi-hsw-4770/igt@i915_selftest@live_blt.html
[32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/fi-hsw-4770/igt@i915_selftest@live_blt.html
[CI#94]: https://gitlab.freedesktop.org/gfx-ci/i915-infra/issues/94
[fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
[fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
[fdo#109635]: https://bugs.freedesktop.org/show_bug.cgi?id=109635
[fdo#111096]: https://bugs.freedesktop.org/show_bug.cgi?id=111096
[fdo#112271]: https://bugs.freedesktop.org/show_bug.cgi?id=112271
[i915#106]: https://gitlab.freedesktop.org/drm/intel/issues/106
[i915#178]: https://gitlab.freedesktop.org/drm/intel/issues/178
[i915#188]: https://gitlab.freedesktop.org/drm/intel/issues/188
[i915#217]: https://gitlab.freedesktop.org/drm/intel/issues/217
[i915#323]: https://gitlab.freedesktop.org/drm/intel/issues/323
[i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
[i915#45]: https://gitlab.freedesktop.org/drm/intel/issues/45
[i915#504]: https://gitlab.freedesktop.org/drm/intel/issues/504
[i915#553]: https://gitlab.freedesktop.org/drm/intel/issues/553
[i915#725]: https://gitlab.freedesktop.org/drm/intel/issues/725
[i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
Participating hosts (49 -> 39)
------------------------------
Additional (2): fi-kbl-7560u fi-gdg-551
Missing (12): fi-kbl-soraka fi-hsw-4200u fi-bdw-gvtdvm fi-bsw-cyan fi-ctg-p8600 fi-elk-e7500 fi-bdw-samus fi-byt-n2820 fi-byt-clapper fi-bsw-nick fi-skl-6600u fi-snb-2600
Build changes
-------------
* CI: CI-20190529 -> None
* Linux: CI_DRM_7869 -> Patchwork_16434
CI-20190529: 20190529
CI_DRM_7869: db0579be255412f38a450c3c577f8d10f1195034 @ git://anongit.freedesktop.org/gfx-ci/linux
IGT_5419: 44913a91e77434b03001bb9ea53216cd03c476e6 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
Patchwork_16434: ff4baf44b0269a86adf6319dafac50877260ce2f @ git://anongit.freedesktop.org/gfx-ci/linux
== Linux commits ==
ff4baf44b026 drm/i915/gem: Don't leak non-persistent requests on changing engines
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16434/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
2020-02-05 16:44 ` Chris Wilson
@ 2020-02-05 18:33 ` Tvrtko Ursulin
2020-02-05 18:39 ` Chris Wilson
0 siblings, 1 reply; 11+ messages in thread
From: Tvrtko Ursulin @ 2020-02-05 18:33 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 05/02/2020 16:44, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-02-05 16:22:34)
>> On 05/02/2020 12:13, Chris Wilson wrote:
>>> If we have a set of active engines marked as being non-persistent, we
>>> lose track of those if the user replaces those engines with
>>> I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
>>> non-persistent requests are terminated if they are no longer being
>>> tracked by the user's context (in order to prevent a lost request
>>> causing an untracked and so unstoppable GPU hang), we need to apply the
>>> same context cancellation upon changing engines.
>>>
>>> Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
>>> Testcase: XXX
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> index 52a749691a8d..20f1d3e0221f 100644
>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>> @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
>>>
>>> replace:
>>> mutex_lock(&ctx->engines_mutex);
>>> +
>>> + /* Flush stale requests off the old engines if required */
>>> + if (!i915_gem_context_is_persistent(ctx) ||
>>> + !i915_modparams.enable_hangcheck)
>>> + kill_context(ctx);
>>
>> Is the negative effect of this is legit contexts can't keep submitting
>> and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but
>> still. Might break legitimate userspace. Not that I offer solutions.. :(
>> Banning changing engines once context went non-persistent? That too can
>> break someone.
>
> It closes the hole we have. To do otherwise, we need to keep track of
> the old engines. Not an impossible task, certainly inconvenient.
>
> struct old_engines {
> struct i915_active active;
> struct list_head link;
> struct i915_gem_context *ctx;
> void *engines;
> int num_engines;
> };
>
> With a list+spinlock in the ctx that we can work in kill_context.
>
> The biggest catch there is actually worrying about attaching the active
> to already executing request, and making sure the coupling doesn't bug
> on a concurrent completion. Hmm, it's just a completion callback, but
> more convenient to use a ready made one.
What would you do with old engines? We don't have a mechanism to mark
intel_context closed. Hm, right, it would get unreachable by definition.
But how to terminate it if it doesn't play nicely?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [Intel-gfx] [PATCH v2] drm/i915/gem: Don't leak non-persistent requests on changing engines
2020-02-05 18:33 ` Tvrtko Ursulin
@ 2020-02-05 18:39 ` Chris Wilson
0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 18:39 UTC (permalink / raw)
To: Tvrtko Ursulin, intel-gfx
Quoting Tvrtko Ursulin (2020-02-05 18:33:57)
>
> On 05/02/2020 16:44, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-02-05 16:22:34)
> >> On 05/02/2020 12:13, Chris Wilson wrote:
> >>> If we have a set of active engines marked as being non-persistent, we
> >>> lose track of those if the user replaces those engines with
> >>> I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
> >>> non-persistent requests are terminated if they are no longer being
> >>> tracked by the user's context (in order to prevent a lost request
> >>> causing an untracked and so unstoppable GPU hang), we need to apply the
> >>> same context cancellation upon changing engines.
> >>>
> >>> Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
> >>> Testcase: XXX
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>> drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++++++
> >>> 1 file changed, 7 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> index 52a749691a8d..20f1d3e0221f 100644
> >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> >>> @@ -1624,11 +1624,18 @@ set_engines(struct i915_gem_context *ctx,
> >>>
> >>> replace:
> >>> mutex_lock(&ctx->engines_mutex);
> >>> +
> >>> + /* Flush stale requests off the old engines if required */
> >>> + if (!i915_gem_context_is_persistent(ctx) ||
> >>> + !i915_modparams.enable_hangcheck)
> >>> + kill_context(ctx);
> >>
> >> Is the negative effect of this is legit contexts can't keep submitting
> >> and changing the map? Only if PREEMPT_TIMEOUT is disabled I think but
> >> still. Might break legitimate userspace. Not that I offer solutions.. :(
> >> Banning changing engines once context went non-persistent? That too can
> >> break someone.
> >
> > It closes the hole we have. To do otherwise, we need to keep track of
> > the old engines. Not an impossible task, certainly inconvenient.
> >
> > struct old_engines {
> > struct i915_active active;
> > struct list_head link;
> > struct i915_gem_context *ctx;
> > void *engines;
> > int num_engines;
> > };
> >
> > With a list+spinlock in the ctx that we can work in kill_context.
> >
> > The biggest catch there is actually worrying about attaching the active
> > to already executing request, and making sure the coupling doesn't bug
> > on a concurrent completion. Hmm, it's just a completion callback, but
> > more convenient to use a ready made one.
>
> What would you do with old engines? We don't have a mechanism to mark
> intel_context closed. Hm, right, it would get unreachable by definition.
> But how to terminate it if it doesn't play nicely?
Wait 30 minutes (if it passes the tests) and you'll find out :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines
2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
2020-02-05 17:17 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2) Patchwork
@ 2020-02-05 19:39 ` Chris Wilson
2020-02-05 20:32 ` Chris Wilson
2020-02-05 22:05 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4) Patchwork
2020-02-05 22:37 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
4 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 19:39 UTC (permalink / raw)
To: intel-gfx
If we have a set of active engines marked as being non-persistent, we
lose track of those if the user replaces those engines with
I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
non-persistent requests are terminated if they are no longer being
tracked by the user's context (in order to prevent a lost request
causing an untracked and so unstoppable GPU hang), we need to apply the
same context cancellation upon changing engines.
v2: Track stale engines[] so we only reap at context closure.
Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
Testcase: igt/gem_ctx_peristence/replace
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 109 ++++++++++++++++--
.../gpu/drm/i915/gem/i915_gem_context_types.h | 11 +-
drivers/gpu/drm/i915/i915_sw_fence.c | 15 ++-
drivers/gpu/drm/i915/i915_sw_fence.h | 2 +-
4 files changed, 124 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 52a749691a8d..01ffb1d1c6a0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -270,7 +270,8 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx)
if (!e)
return ERR_PTR(-ENOMEM);
- init_rcu_head(&e->rcu);
+ e->ctx = ctx;
+
for_each_engine(engine, gt, id) {
struct intel_context *ce;
@@ -450,7 +451,7 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce)
return engine;
}
-static void kill_context(struct i915_gem_context *ctx)
+static void kill_engines(struct i915_gem_engines *engines)
{
struct i915_gem_engines_iter it;
struct intel_context *ce;
@@ -462,7 +463,7 @@ static void kill_context(struct i915_gem_context *ctx)
* However, we only care about pending requests, so only include
* engines on which there are incomplete requests.
*/
- for_each_gem_engine(ce, __context_engines_static(ctx), it) {
+ for_each_gem_engine(ce, engines, it) {
struct intel_engine_cs *engine;
if (intel_context_set_banned(ce))
@@ -484,8 +485,32 @@ static void kill_context(struct i915_gem_context *ctx)
* the context from the GPU, we have to resort to a full
* reset. We hope the collateral damage is worth it.
*/
- __reset_context(ctx, engine);
+ __reset_context(engines->ctx, engine);
+ }
+}
+
+static void kill_context(struct i915_gem_context *ctx)
+{
+ struct i915_gem_engines *pos, *next;
+
+ spin_lock(&ctx->stale_lock);
+ list_for_each_entry_safe(pos, next, &ctx->stale_list, link) {
+ if (!i915_sw_fence_await(&pos->fence))
+ continue;
+
+ spin_unlock(&ctx->stale_lock);
+
+ kill_engines(pos);
+
+ spin_lock(&ctx->stale_lock);
+ list_safe_reset_next(pos, next, link);
+ list_del_init(&pos->link);
+
+ i915_sw_fence_complete(&pos->fence);
}
+ spin_unlock(&ctx->stale_lock);
+
+ kill_engines(__context_engines_static(ctx));
}
static void set_closed_name(struct i915_gem_context *ctx)
@@ -602,6 +627,9 @@ __create_context(struct drm_i915_private *i915)
ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
mutex_init(&ctx->mutex);
+ INIT_LIST_HEAD(&ctx->stale_list);
+ spin_lock_init(&ctx->stale_lock);
+
mutex_init(&ctx->engines_mutex);
e = default_engines(ctx);
if (IS_ERR(e)) {
@@ -1529,6 +1557,66 @@ static const i915_user_extension_fn set_engines__extensions[] = {
[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
};
+static int engines_notify(struct i915_sw_fence *fence,
+ enum i915_sw_fence_notify state)
+{
+ struct i915_gem_engines *engines =
+ container_of(fence, typeof(*engines), fence);
+
+ switch (state) {
+ case FENCE_COMPLETE:
+ if (!list_empty(&engines->link)) {
+ spin_lock(&engines->ctx->stale_lock);
+ list_del(&engines->link);
+ spin_unlock(&engines->ctx->stale_lock);
+ }
+ break;
+
+ case FENCE_FREE:
+ init_rcu_head(&engines->rcu);
+ call_rcu(&engines->rcu, free_engines_rcu);
+ break;
+ }
+
+ return NOTIFY_DONE;
+}
+
+static int engines_listen(struct i915_gem_engines *engines)
+{
+ struct i915_gem_engines_iter it;
+ struct intel_context *ce;
+
+ GEM_BUG_ON(!engines);
+ i915_sw_fence_init(&engines->fence, engines_notify);
+
+ for_each_gem_engine(ce, engines, it) {
+ struct dma_fence *fence;
+ int err;
+
+ if (!ce->timeline)
+ continue;
+
+ fence = i915_active_fence_get(&ce->timeline->last_request);
+ if (!fence)
+ continue;
+
+ err = i915_sw_fence_await_dma_fence(&engines->fence,
+ fence, 0,
+ GFP_KERNEL);
+
+ dma_fence_put(fence);
+ if (err < 0)
+ return err;
+ }
+
+ spin_lock(&engines->ctx->stale_lock);
+ list_add(&engines->link, &engines->ctx->stale_list);
+ spin_unlock(&engines->ctx->stale_lock);
+
+ i915_sw_fence_commit(&engines->fence);
+ return 0;
+}
+
static int
set_engines(struct i915_gem_context *ctx,
const struct drm_i915_gem_context_param *args)
@@ -1571,7 +1659,8 @@ set_engines(struct i915_gem_context *ctx,
if (!set.engines)
return -ENOMEM;
- init_rcu_head(&set.engines->rcu);
+ set.engines->ctx = ctx;
+
for (n = 0; n < num_engines; n++) {
struct i915_engine_class_instance ci;
struct intel_engine_cs *engine;
@@ -1631,7 +1720,11 @@ set_engines(struct i915_gem_context *ctx,
set.engines = rcu_replace_pointer(ctx->engines, set.engines, 1);
mutex_unlock(&ctx->engines_mutex);
- call_rcu(&set.engines->rcu, free_engines_rcu);
+ /* Keep track of old engine sets for kill_context() */
+ if (engines_listen(set.engines)) {
+ kill_engines(set.engines); /* on error, immediately cleanup */
+ engines_notify(&set.engines->fence, FENCE_FREE);
+ }
return 0;
}
@@ -1646,7 +1739,6 @@ __copy_engines(struct i915_gem_engines *e)
if (!copy)
return ERR_PTR(-ENOMEM);
- init_rcu_head(©->rcu);
for (n = 0; n < e->num_engines; n++) {
if (e->engines[n])
copy->engines[n] = intel_context_get(e->engines[n]);
@@ -1890,7 +1982,8 @@ static int clone_engines(struct i915_gem_context *dst,
if (!clone)
goto err_unlock;
- init_rcu_head(&clone->rcu);
+ clone->ctx = dst;
+
for (n = 0; n < e->num_engines; n++) {
struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 017ca803ab47..39c2f6189684 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -20,6 +20,7 @@
#include "gt/intel_context_types.h"
#include "i915_scheduler.h"
+#include "i915_sw_fence.h"
struct pid;
@@ -30,7 +31,12 @@ struct intel_timeline;
struct intel_ring;
struct i915_gem_engines {
- struct rcu_head rcu;
+ union {
+ struct rcu_head rcu;
+ struct list_head link;
+ };
+ struct i915_sw_fence fence;
+ struct i915_gem_context *ctx;
unsigned int num_engines;
struct intel_context *engines[];
};
@@ -173,6 +179,9 @@ struct i915_gem_context {
* context in messages.
*/
char name[TASK_COMM_LEN + 8];
+
+ struct spinlock stale_lock;
+ struct list_head stale_list;
};
#endif /* __I915_GEM_CONTEXT_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 51ba97daf2a0..e47b9cf952c4 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -211,10 +211,19 @@ void i915_sw_fence_complete(struct i915_sw_fence *fence)
__i915_sw_fence_complete(fence, NULL);
}
-void i915_sw_fence_await(struct i915_sw_fence *fence)
+bool i915_sw_fence_await(struct i915_sw_fence *fence)
{
- debug_fence_assert(fence);
- WARN_ON(atomic_inc_return(&fence->pending) <= 1);
+ int old, new;
+
+ new = atomic_read(&fence->pending);
+ do {
+ if (new < 1)
+ return false;
+
+ old = new++;
+ } while((new = atomic_cmpxchg(&fence->pending, old, new)) != old);
+
+ return true;
}
void __i915_sw_fence_init(struct i915_sw_fence *fence,
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 19e806ce43bc..30a863353ee6 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -91,7 +91,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
unsigned long timeout,
gfp_t gfp);
-void i915_sw_fence_await(struct i915_sw_fence *fence);
+bool i915_sw_fence_await(struct i915_sw_fence *fence);
void i915_sw_fence_complete(struct i915_sw_fence *fence);
static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
--
2.25.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines
2020-02-05 19:39 ` [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
@ 2020-02-05 20:32 ` Chris Wilson
0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2020-02-05 20:32 UTC (permalink / raw)
To: intel-gfx
If we have a set of active engines marked as being non-persistent, we
lose track of those if the user replaces those engines with
I915_CONTEXT_PARAM_ENGINES. As part of our uABI contract is that
non-persistent requests are terminated if they are no longer being
tracked by the user's context (in order to prevent a lost request
causing an untracked and so unstoppable GPU hang), we need to apply the
same context cancellation upon changing engines.
v2: Track stale engines[] so we only reap at context closure.
Fixes: a0e047156cde ("drm/i915/gem: Make context persistence optional")
Testcase: igt/gem_ctx_peristence/replace
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 118 ++++++++++++++++--
.../gpu/drm/i915/gem/i915_gem_context_types.h | 11 +-
drivers/gpu/drm/i915/i915_sw_fence.c | 15 ++-
drivers/gpu/drm/i915/i915_sw_fence.h | 2 +-
4 files changed, 133 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 52a749691a8d..21ce84cc1f68 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -270,7 +270,8 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx)
if (!e)
return ERR_PTR(-ENOMEM);
- init_rcu_head(&e->rcu);
+ e->ctx = ctx;
+
for_each_engine(engine, gt, id) {
struct intel_context *ce;
@@ -450,7 +451,7 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce)
return engine;
}
-static void kill_context(struct i915_gem_context *ctx)
+static void kill_engines(struct i915_gem_engines *engines)
{
struct i915_gem_engines_iter it;
struct intel_context *ce;
@@ -462,7 +463,7 @@ static void kill_context(struct i915_gem_context *ctx)
* However, we only care about pending requests, so only include
* engines on which there are incomplete requests.
*/
- for_each_gem_engine(ce, __context_engines_static(ctx), it) {
+ for_each_gem_engine(ce, engines, it) {
struct intel_engine_cs *engine;
if (intel_context_set_banned(ce))
@@ -484,10 +485,41 @@ static void kill_context(struct i915_gem_context *ctx)
* the context from the GPU, we have to resort to a full
* reset. We hope the collateral damage is worth it.
*/
- __reset_context(ctx, engine);
+ __reset_context(engines->ctx, engine);
+ }
+}
+
+static void kill_stale_engines(struct i915_gem_context *ctx)
+{
+ if (!list_empty(&ctx->stale_list)) {
+ struct i915_gem_engines *pos, *next;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ctx->stale_lock, flags);
+ list_for_each_entry_safe(pos, next, &ctx->stale_list, link) {
+ if (!i915_sw_fence_await(&pos->fence))
+ continue;
+
+ spin_unlock_irqrestore(&ctx->stale_lock, flags);
+
+ kill_engines(pos);
+
+ spin_lock_irqsave(&ctx->stale_lock, flags);
+ list_safe_reset_next(pos, next, link);
+ list_del_init(&pos->link);
+
+ i915_sw_fence_complete(&pos->fence);
+ }
+ spin_unlock_irqrestore(&ctx->stale_lock, flags);
}
}
+static void kill_context(struct i915_gem_context *ctx)
+{
+ kill_stale_engines(ctx);
+ kill_engines(__context_engines_static(ctx));
+}
+
static void set_closed_name(struct i915_gem_context *ctx)
{
char *s;
@@ -602,6 +634,9 @@ __create_context(struct drm_i915_private *i915)
ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
mutex_init(&ctx->mutex);
+ INIT_LIST_HEAD(&ctx->stale_list);
+ spin_lock_init(&ctx->stale_lock);
+
mutex_init(&ctx->engines_mutex);
e = default_engines(ctx);
if (IS_ERR(e)) {
@@ -1529,6 +1564,71 @@ static const i915_user_extension_fn set_engines__extensions[] = {
[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
};
+static int engines_notify(struct i915_sw_fence *fence,
+ enum i915_sw_fence_notify state)
+{
+ struct i915_gem_engines *engines =
+ container_of(fence, typeof(*engines), fence);
+
+ switch (state) {
+ case FENCE_COMPLETE:
+ if (!list_empty(&engines->link)) {
+ struct i915_gem_context *ctx = engines->ctx;
+ unsigned long flags;
+
+ spin_lock_irqsave(&ctx->stale_lock, flags);
+ list_del(&engines->link);
+ spin_unlock_irqrestore(&ctx->stale_lock, flags);
+ }
+ break;
+
+ case FENCE_FREE:
+ init_rcu_head(&engines->rcu);
+ call_rcu(&engines->rcu, free_engines_rcu);
+ break;
+ }
+
+ return NOTIFY_DONE;
+}
+
+static void engines_idle_release(struct i915_gem_engines *engines)
+{
+ struct i915_gem_engines_iter it;
+ struct intel_context *ce;
+ unsigned long flags;
+
+ GEM_BUG_ON(!engines);
+ i915_sw_fence_init(&engines->fence, engines_notify);
+
+ spin_lock_irqsave(&engines->ctx->stale_lock, flags);
+ list_add(&engines->link, &engines->ctx->stale_list);
+ spin_unlock_irqrestore(&engines->ctx->stale_lock, flags);
+
+ for_each_gem_engine(ce, engines, it) {
+ struct dma_fence *fence;
+ int err;
+
+ if (!ce->timeline)
+ continue;
+
+ fence = i915_active_fence_get(&ce->timeline->last_request);
+ if (!fence)
+ continue;
+
+ err = i915_sw_fence_await_dma_fence(&engines->fence,
+ fence, 0,
+ GFP_KERNEL);
+
+ dma_fence_put(fence);
+ if (err < 0) {
+ kill_engines(engines);
+ break;
+ }
+ }
+
+ i915_sw_fence_commit(&engines->fence);
+}
+
static int
set_engines(struct i915_gem_context *ctx,
const struct drm_i915_gem_context_param *args)
@@ -1571,7 +1671,8 @@ set_engines(struct i915_gem_context *ctx,
if (!set.engines)
return -ENOMEM;
- init_rcu_head(&set.engines->rcu);
+ set.engines->ctx = ctx;
+
for (n = 0; n < num_engines; n++) {
struct i915_engine_class_instance ci;
struct intel_engine_cs *engine;
@@ -1631,7 +1732,8 @@ set_engines(struct i915_gem_context *ctx,
set.engines = rcu_replace_pointer(ctx->engines, set.engines, 1);
mutex_unlock(&ctx->engines_mutex);
- call_rcu(&set.engines->rcu, free_engines_rcu);
+ /* Keep track of old engine sets for kill_context() */
+ engines_idle_release(set.engines);
return 0;
}
@@ -1646,7 +1748,6 @@ __copy_engines(struct i915_gem_engines *e)
if (!copy)
return ERR_PTR(-ENOMEM);
- init_rcu_head(©->rcu);
for (n = 0; n < e->num_engines; n++) {
if (e->engines[n])
copy->engines[n] = intel_context_get(e->engines[n]);
@@ -1890,7 +1991,8 @@ static int clone_engines(struct i915_gem_context *dst,
if (!clone)
goto err_unlock;
- init_rcu_head(&clone->rcu);
+ clone->ctx = dst;
+
for (n = 0; n < e->num_engines; n++) {
struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 017ca803ab47..39c2f6189684 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -20,6 +20,7 @@
#include "gt/intel_context_types.h"
#include "i915_scheduler.h"
+#include "i915_sw_fence.h"
struct pid;
@@ -30,7 +31,12 @@ struct intel_timeline;
struct intel_ring;
struct i915_gem_engines {
- struct rcu_head rcu;
+ union {
+ struct rcu_head rcu;
+ struct list_head link;
+ };
+ struct i915_sw_fence fence;
+ struct i915_gem_context *ctx;
unsigned int num_engines;
struct intel_context *engines[];
};
@@ -173,6 +179,9 @@ struct i915_gem_context {
* context in messages.
*/
char name[TASK_COMM_LEN + 8];
+
+ struct spinlock stale_lock;
+ struct list_head stale_list;
};
#endif /* __I915_GEM_CONTEXT_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 51ba97daf2a0..9a20b7246f91 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -211,10 +211,19 @@ void i915_sw_fence_complete(struct i915_sw_fence *fence)
__i915_sw_fence_complete(fence, NULL);
}
-void i915_sw_fence_await(struct i915_sw_fence *fence)
+bool i915_sw_fence_await(struct i915_sw_fence *fence)
{
- debug_fence_assert(fence);
- WARN_ON(atomic_inc_return(&fence->pending) <= 1);
+ int old, new;
+
+ new = atomic_read(&fence->pending);
+ do {
+ if (new < 1)
+ return false;
+
+ old = new++;
+ } while ((new = atomic_cmpxchg(&fence->pending, old, new)) != old);
+
+ return true;
}
void __i915_sw_fence_init(struct i915_sw_fence *fence,
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 19e806ce43bc..30a863353ee6 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -91,7 +91,7 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
unsigned long timeout,
gfp_t gfp);
-void i915_sw_fence_await(struct i915_sw_fence *fence);
+bool i915_sw_fence_await(struct i915_sw_fence *fence);
void i915_sw_fence_complete(struct i915_sw_fence *fence);
static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
--
2.25.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4)
2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
` (2 preceding siblings ...)
2020-02-05 19:39 ` [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
@ 2020-02-05 22:05 ` Patchwork
2020-02-05 22:37 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-02-05 22:05 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4)
URL : https://patchwork.freedesktop.org/series/73023/
State : warning
== Summary ==
$ dim checkpatch origin/drm-tip
66727965053c drm/i915/gem: Don't leak non-persistent requests on changing engines
-:248: WARNING:USE_SPINLOCK_T: struct spinlock should be spinlock_t
#248: FILE: drivers/gpu/drm/i915/gem/i915_gem_context_types.h:183:
+ struct spinlock stale_lock;
total: 0 errors, 1 warnings, 0 checks, 240 lines checked
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4)
2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
` (3 preceding siblings ...)
2020-02-05 22:05 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4) Patchwork
@ 2020-02-05 22:37 ` Patchwork
4 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-02-05 22:37 UTC (permalink / raw)
To: Chris Wilson; +Cc: intel-gfx
== Series Details ==
Series: drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4)
URL : https://patchwork.freedesktop.org/series/73023/
State : failure
== Summary ==
CI Bug Log - changes from CI_DRM_7871 -> Patchwork_16442
====================================================
Summary
-------
**FAILURE**
Serious unknown changes coming with Patchwork_16442 absolutely need to be
verified manually.
If you think the reported changes have nothing to do with the changes
introduced in Patchwork_16442, please notify your bug team to allow them
to document this new failure mode, which will reduce false positives in CI.
External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/index.html
Possible new issues
-------------------
Here are the unknown changes that may have been introduced in Patchwork_16442:
### IGT changes ###
#### Possible regressions ####
* igt@i915_selftest@live_gem_contexts:
- fi-kbl-x1275: [PASS][1] -> [DMESG-FAIL][2]
[1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-kbl-x1275/igt@i915_selftest@live_gem_contexts.html
[2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-kbl-x1275/igt@i915_selftest@live_gem_contexts.html
Known issues
------------
Here are the changes found in Patchwork_16442 that come from known issues:
### IGT changes ###
#### Issues hit ####
* igt@i915_selftest@live_blt:
- fi-hsw-4770r: [PASS][3] -> [DMESG-FAIL][4] ([i915#553] / [i915#725])
[3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-hsw-4770r/igt@i915_selftest@live_blt.html
[4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-hsw-4770r/igt@i915_selftest@live_blt.html
- fi-hsw-4770: [PASS][5] -> [DMESG-FAIL][6] ([i915#725])
[5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-hsw-4770/igt@i915_selftest@live_blt.html
[6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-hsw-4770/igt@i915_selftest@live_blt.html
* igt@i915_selftest@live_execlists:
- fi-icl-y: [PASS][7] -> [DMESG-FAIL][8] ([fdo#108569])
[7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-icl-y/igt@i915_selftest@live_execlists.html
[8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-icl-y/igt@i915_selftest@live_execlists.html
* igt@kms_chamelium@hdmi-hpd-fast:
- fi-icl-u2: [PASS][9] -> [FAIL][10] ([i915#217])
[9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html
[10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-icl-u2/igt@kms_chamelium@hdmi-hpd-fast.html
* igt@prime_self_import@basic-llseek-size:
- fi-tgl-y: [PASS][11] -> [DMESG-WARN][12] ([CI#94] / [i915#402]) +1 similar issue
[11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-tgl-y/igt@prime_self_import@basic-llseek-size.html
[12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-tgl-y/igt@prime_self_import@basic-llseek-size.html
#### Possible fixes ####
* igt@i915_selftest@live_gtt:
- fi-icl-guc: [TIMEOUT][13] ([fdo#112271]) -> [PASS][14]
[13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-icl-guc/igt@i915_selftest@live_gtt.html
[14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-icl-guc/igt@i915_selftest@live_gtt.html
* igt@kms_addfb_basic@bad-pitch-128:
- fi-tgl-y: [DMESG-WARN][15] ([CI#94] / [i915#402]) -> [PASS][16] +1 similar issue
[15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_7871/fi-tgl-y/igt@kms_addfb_basic@bad-pitch-128.html
[16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/fi-tgl-y/igt@kms_addfb_basic@bad-pitch-128.html
[CI#94]: https://gitlab.freedesktop.org/gfx-ci/i915-infra/issues/94
[fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
[fdo#112271]: https://bugs.freedesktop.org/show_bug.cgi?id=112271
[i915#217]: https://gitlab.freedesktop.org/drm/intel/issues/217
[i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
[i915#553]: https://gitlab.freedesktop.org/drm/intel/issues/553
[i915#725]: https://gitlab.freedesktop.org/drm/intel/issues/725
Participating hosts (45 -> 42)
------------------------------
Additional (5): fi-bdw-5557u fi-hsw-peppy fi-bwr-2160 fi-ilk-650 fi-kbl-7500u
Missing (8): fi-bsw-n3050 fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-ivb-3770 fi-byt-clapper fi-bdw-samus fi-kbl-r
Build changes
-------------
* CI: CI-20190529 -> None
* Linux: CI_DRM_7871 -> Patchwork_16442
CI-20190529: 20190529
CI_DRM_7871: c9b0237ee7ffb1bbb62f864f0b2d7b290ee1313d @ git://anongit.freedesktop.org/gfx-ci/linux
IGT_5419: 44913a91e77434b03001bb9ea53216cd03c476e6 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
Patchwork_16442: 66727965053c0e896d166743999590651bf4ca80 @ git://anongit.freedesktop.org/gfx-ci/linux
== Linux commits ==
66727965053c drm/i915/gem: Don't leak non-persistent requests on changing engines
== Logs ==
For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_16442/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2020-02-05 22:37 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05 12:11 [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
2020-02-05 12:13 ` [Intel-gfx] [PATCH v2] " Chris Wilson
2020-02-05 16:22 ` Tvrtko Ursulin
2020-02-05 16:44 ` Chris Wilson
2020-02-05 18:33 ` Tvrtko Ursulin
2020-02-05 18:39 ` Chris Wilson
2020-02-05 17:17 ` [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev2) Patchwork
2020-02-05 19:39 ` [Intel-gfx] [PATCH] drm/i915/gem: Don't leak non-persistent requests on changing engines Chris Wilson
2020-02-05 20:32 ` Chris Wilson
2020-02-05 22:05 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gem: Don't leak non-persistent requests on changing engines (rev4) Patchwork
2020-02-05 22:37 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.