All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
@ 2020-10-13 10:32 Chris Wilson
  2020-10-13 12:27 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Chris Wilson @ 2020-10-13 10:32 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Since removing dev->struct_mutex usage, we only use i915->wq for batch
freeing of GEM objects and ppGTT, it is essential for memory reclaim. If
we let the workqueue dawdle, we trap excess amounts of memory, so give
it a priority boost. Although since we no longer depend on a singular
mutex, we could run unbounded, but first lets try to keep some
constraint upon the worker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: CQ Tang <cq.tang@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 8bb7e2dcfaaa..8c9198f0d2ad 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -219,20 +219,10 @@ intel_teardown_mchbar(struct drm_i915_private *dev_priv)
 static int i915_workqueues_init(struct drm_i915_private *dev_priv)
 {
 	/*
-	 * The i915 workqueue is primarily used for batched retirement of
-	 * requests (and thus managing bo) once the task has been completed
-	 * by the GPU. i915_retire_requests() is called directly when we
-	 * need high-priority retirement, such as waiting for an explicit
-	 * bo.
-	 *
-	 * It is also used for periodic low-priority events, such as
-	 * idle-timers and recording error state.
-	 *
-	 * All tasks on the workqueue are expected to acquire the dev mutex
-	 * so there is no point in running more than one instance of the
-	 * workqueue at any time.  Use an ordered one.
+	 * The i915 workqueue is primarily used for batched freeing of
+	 * GEM objects and ppGTT, and is essential for memory reclaim.
 	 */
-	dev_priv->wq = alloc_ordered_workqueue("i915", 0);
+	dev_priv->wq = alloc_ordered_workqueue("i915", WQ_HIGHPRI);
 	if (dev_priv->wq == NULL)
 		goto out_err;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-13 10:32 [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority Chris Wilson
@ 2020-10-13 12:27 ` Patchwork
  2020-10-13 16:19 ` [Intel-gfx] [PATCH] " Tang, CQ
  2020-10-14  3:19 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for " Patchwork
  2 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-10-13 12:27 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 5338 bytes --]

== Series Details ==

Series: drm/i915: Make the GEM reclaim workqueue high priority
URL   : https://patchwork.freedesktop.org/series/82616/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_9135 -> Patchwork_18685
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/index.html

Known issues
------------

  Here are the changes found in Patchwork_18685 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_tiled_blits@basic:
    - fi-tgl-y:           [PASS][1] -> [DMESG-WARN][2] ([i915#402]) +1 similar issue
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-tgl-y/igt@gem_tiled_blits@basic.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-tgl-y/igt@gem_tiled_blits@basic.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - fi-bsw-kefka:       [PASS][3] -> [DMESG-WARN][4] ([i915#1982])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-bsw-kefka/igt@i915_pm_rpm@basic-pci-d3-state.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-bsw-kefka/igt@i915_pm_rpm@basic-pci-d3-state.html

  * igt@i915_selftest@live@gt_heartbeat:
    - fi-tgl-u2:          [PASS][5] -> [INCOMPLETE][6] ([i915#2557])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-tgl-u2/igt@i915_selftest@live@gt_heartbeat.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-tgl-u2/igt@i915_selftest@live@gt_heartbeat.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-byt-j1900:       [PASS][7] -> [DMESG-WARN][8] ([i915#1982])
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-byt-j1900/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-byt-j1900/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_flip@basic-flip-vs-modeset@d-edp1:
    - fi-tgl-y:           [PASS][9] -> [DMESG-WARN][10] ([i915#1982])
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-tgl-y/igt@kms_flip@basic-flip-vs-modeset@d-edp1.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-tgl-y/igt@kms_flip@basic-flip-vs-modeset@d-edp1.html

  
#### Possible fixes ####

  * igt@gem_flink_basic@bad-flink:
    - fi-tgl-y:           [DMESG-WARN][11] ([i915#402]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-tgl-y/igt@gem_flink_basic@bad-flink.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-tgl-y/igt@gem_flink_basic@bad-flink.html

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - fi-byt-j1900:       [DMESG-WARN][13] ([i915#1982]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-byt-j1900/igt@i915_pm_rpm@basic-pci-d3-state.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-byt-j1900/igt@i915_pm_rpm@basic-pci-d3-state.html
    - fi-bsw-n3050:       [DMESG-WARN][15] ([i915#1982]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-bsw-n3050/igt@i915_pm_rpm@basic-pci-d3-state.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-bsw-n3050/igt@i915_pm_rpm@basic-pci-d3-state.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-apl-guc:         [DMESG-WARN][17] ([i915#1635] / [i915#1982]) -> [PASS][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-apl-guc/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-apl-guc/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-c:
    - {fi-tgl-dsi}:       [DMESG-WARN][19] ([i915#1982]) -> [PASS][20]
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/fi-tgl-dsi/igt@kms_pipe_crc_basic@read-crc-pipe-c.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/fi-tgl-dsi/igt@kms_pipe_crc_basic@read-crc-pipe-c.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2557]: https://gitlab.freedesktop.org/drm/intel/issues/2557
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402


Participating hosts (47 -> 40)
------------------------------

  Missing    (7): fi-cml-u2 fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_9135 -> Patchwork_18685

  CI-20190529: 20190529
  CI_DRM_9135: eb70ad33fcc91d3464b07679391fb477927ad4c7 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5813: d4e6dd955a1dad02271aa41c9389f5097ee17765 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18685: faa458087c905b3273a2f7359e337725180fb019 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

faa458087c90 drm/i915: Make the GEM reclaim workqueue high priority

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/index.html

[-- Attachment #1.2: Type: text/html, Size: 6661 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-13 10:32 [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority Chris Wilson
  2020-10-13 12:27 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
@ 2020-10-13 16:19 ` Tang, CQ
  2020-10-13 16:24   ` Chris Wilson
  2020-10-14  3:19 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for " Patchwork
  2 siblings, 1 reply; 11+ messages in thread
From: Tang, CQ @ 2020-10-13 16:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris,
    I tested this patch. It is still not enough, I keep catch running out of lmem.  Every worker invocation takes larger and larger freeing object count.

Here is my debugging code:

+static int obj_count = 0;
+
......
+       if (llist_add(&obj->freed, &i915->mm.free_list)) {
+               bool b;
+               b = queue_work(i915->wq, &i915->mm.free_work);
+               pr_err("queue_work: %d, %d; %d\n", atomic_read(&i915->mm.free_count), obj_count, b);
+               obj_count = 1;
+       } else {
+               obj_count++;
+       }
	
And here  is the output:

[  821.213680] queue_work: 108068, 105764; 1
[  823.309468] queue_work: 148843, 147948; 1
[  826.453132] queue_work: 220000, 218123; 1
[  831.522506] queue_work: 334812, 333773; 1
[  840.040571] queue_work: 539650, 538922; 1
[  860.337644] queue_work: 960811, 1017158; 1

The second number, 'obj_count', is the objects taken by last worker invocation to free.

--CQ


> -----Original Message-----
> From: Chris Wilson <chris@chris-wilson.co.uk>
> Sent: Tuesday, October 13, 2020 3:33 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: Chris Wilson <chris@chris-wilson.co.uk>; Tang, CQ <cq.tang@intel.com>
> Subject: [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
> 
> Since removing dev->struct_mutex usage, we only use i915->wq for batch
> freeing of GEM objects and ppGTT, it is essential for memory reclaim. If we
> let the workqueue dawdle, we trap excess amounts of memory, so give it a
> priority boost. Although since we no longer depend on a singular mutex, we
> could run unbounded, but first lets try to keep some constraint upon the
> worker.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: CQ Tang <cq.tang@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 16 +++-------------
>  1 file changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c index 8bb7e2dcfaaa..8c9198f0d2ad
> 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -219,20 +219,10 @@ intel_teardown_mchbar(struct drm_i915_private
> *dev_priv)  static int i915_workqueues_init(struct drm_i915_private
> *dev_priv)  {
>  	/*
> -	 * The i915 workqueue is primarily used for batched retirement of
> -	 * requests (and thus managing bo) once the task has been
> completed
> -	 * by the GPU. i915_retire_requests() is called directly when we
> -	 * need high-priority retirement, such as waiting for an explicit
> -	 * bo.
> -	 *
> -	 * It is also used for periodic low-priority events, such as
> -	 * idle-timers and recording error state.
> -	 *
> -	 * All tasks on the workqueue are expected to acquire the dev mutex
> -	 * so there is no point in running more than one instance of the
> -	 * workqueue at any time.  Use an ordered one.
> +	 * The i915 workqueue is primarily used for batched freeing of
> +	 * GEM objects and ppGTT, and is essential for memory reclaim.
>  	 */
> -	dev_priv->wq = alloc_ordered_workqueue("i915", 0);
> +	dev_priv->wq = alloc_ordered_workqueue("i915", WQ_HIGHPRI);
>  	if (dev_priv->wq == NULL)
>  		goto out_err;
> 
> --
> 2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-13 16:19 ` [Intel-gfx] [PATCH] " Tang, CQ
@ 2020-10-13 16:24   ` Chris Wilson
  2020-10-13 16:40     ` Tang, CQ
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-10-13 16:24 UTC (permalink / raw)
  To: Tang, CQ, intel-gfx

Quoting Tang, CQ (2020-10-13 17:19:27)
> Chris,
>     I tested this patch. It is still not enough, I keep catch running out of lmem.  Every worker invocation takes larger and larger freeing object count.
> 

Was that with the immediate call (not via call_rcu) to
__i915_gem_free_object_rcu?

If this brings the freelist under control, the next item is judicious
use of cond_synchronize_rcu(). We just have to make sure we penalize the
right hog.

Otherwise, we have to shotgun apply i915_gem_flush_free_objects() and
still find somewhere to put the rcu sync.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-13 16:24   ` Chris Wilson
@ 2020-10-13 16:40     ` Tang, CQ
  2020-10-13 23:29       ` Tang, CQ
  0 siblings, 1 reply; 11+ messages in thread
From: Tang, CQ @ 2020-10-13 16:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



> -----Original Message-----
> From: Chris Wilson <chris@chris-wilson.co.uk>
> Sent: Tuesday, October 13, 2020 9:25 AM
> To: Tang, CQ <cq.tang@intel.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> high priority
> 
> Quoting Tang, CQ (2020-10-13 17:19:27)
> > Chris,
> >     I tested this patch. It is still not enough, I keep catch running out of lmem.
> Every worker invocation takes larger and larger freeing object count.
> >
> 
> Was that with the immediate call (not via call_rcu) to
> __i915_gem_free_object_rcu?
> 
> If this brings the freelist under control, the next item is judicious use of
> cond_synchronize_rcu(). We just have to make sure we penalize the right
> hog.
> 
> Otherwise, we have to shotgun apply i915_gem_flush_free_objects() and
> still find somewhere to put the rcu sync.

This is with call_rcu().

Then I removed cond_resched(), it does not help, and then I call __i915_gem_free_object_rcu() directly, still the same error,
However, I noticed that sometimes 'queue_work()' return false, which means the work is already queued, how? The worker had been called so 'free_list' is empty:

[  117.381888] queue_work: 107967, 107930; 1
[  119.180230] queue_work: 125531, 125513; 1
[  121.349308] queue_work: 155017, 154996; 1
[  124.214885] queue_work: 193918, 193873; 1
[  127.967260] queue_work: 256838, 256776; 1
[  133.281045] queue_work: 345753, 345734; 1
[  141.457995] queue_work: 516943, 516859; 1
[  156.264420] queue_work: 863622, 863516; 1
[  156.322619] queue_work: 865849, 3163; 0
[  156.448551] queue_work: 865578, 7141; 0
[  156.882985] queue_work: 866984, 24138; 0
[  157.952163] queue_work: 862902, 53365; 0
[  159.838412] queue_work: 842522, 95504; 0
[  174.321508] queue_work: 937179, 657323; 0

--CQ

> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-13 16:40     ` Tang, CQ
@ 2020-10-13 23:29       ` Tang, CQ
  2020-10-15 15:06         ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Tang, CQ @ 2020-10-13 23:29 UTC (permalink / raw)
  To: Tang, CQ, Chris Wilson, intel-gfx

i915_gem_free_object() is called by multiple threads/processes, they all add objects onto the same free_list. The free_list processing worker thread becomes bottle-neck. I see that the worker is mostly a single thread (with particular thread ID), but sometimes multiple threads are launched to process the 'free_list' work concurrently. But the processing speed is still slower than the multiple process's feeding speed, and 'free_list' is holding more and more memory.

The worker launching time is delayed a lot, we call queue_work() when we add the first object onto the empty 'free_list', but when the worker is launched, the 'free_list' has sometimes accumulated 1M objects. Maybe it is because of waiting currently running worker to finish?

This happens with direct call to __i915_gem_free_object_rcu() and no cond_resched().

--CQ

> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of
> Tang, CQ
> Sent: Tuesday, October 13, 2020 9:41 AM
> To: Chris Wilson <chris@chris-wilson.co.uk>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> high priority
> 
> 
> 
> > -----Original Message-----
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > Sent: Tuesday, October 13, 2020 9:25 AM
> > To: Tang, CQ <cq.tang@intel.com>; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim
> > workqueue high priority
> >
> > Quoting Tang, CQ (2020-10-13 17:19:27)
> > > Chris,
> > >     I tested this patch. It is still not enough, I keep catch running out of
> lmem.
> > Every worker invocation takes larger and larger freeing object count.
> > >
> >
> > Was that with the immediate call (not via call_rcu) to
> > __i915_gem_free_object_rcu?
> >
> > If this brings the freelist under control, the next item is judicious
> > use of cond_synchronize_rcu(). We just have to make sure we penalize
> > the right hog.
> >
> > Otherwise, we have to shotgun apply i915_gem_flush_free_objects() and
> > still find somewhere to put the rcu sync.
> 
> This is with call_rcu().
> 
> Then I removed cond_resched(), it does not help, and then I call
> __i915_gem_free_object_rcu() directly, still the same error, However, I
> noticed that sometimes 'queue_work()' return false, which means the work
> is already queued, how? The worker had been called so 'free_list' is empty:
> 
> [  117.381888] queue_work: 107967, 107930; 1 [  119.180230] queue_work:
> 125531, 125513; 1 [  121.349308] queue_work: 155017, 154996; 1 [  124.214885]
> queue_work: 193918, 193873; 1 [  127.967260] queue_work: 256838, 256776;
> 1 [  133.281045] queue_work: 345753, 345734; 1 [  141.457995] queue_work:
> 516943, 516859; 1 [  156.264420] queue_work: 863622, 863516; 1 [  156.322619]
> queue_work: 865849, 3163; 0 [  156.448551] queue_work: 865578, 7141; 0
> [  156.882985] queue_work: 866984, 24138; 0 [  157.952163] queue_work:
> 862902, 53365; 0 [  159.838412] queue_work: 842522, 95504; 0 [  174.321508]
> queue_work: 937179, 657323; 0
> 
> --CQ
> 
> > -Chris
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-13 10:32 [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority Chris Wilson
  2020-10-13 12:27 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
  2020-10-13 16:19 ` [Intel-gfx] [PATCH] " Tang, CQ
@ 2020-10-14  3:19 ` Patchwork
  2 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2020-10-14  3:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 12999 bytes --]

== Series Details ==

Series: drm/i915: Make the GEM reclaim workqueue high priority
URL   : https://patchwork.freedesktop.org/series/82616/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_9135_full -> Patchwork_18685_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_18685_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_18685_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_18685_full:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-shrfb-plflip-blt:
    - shard-tglb:         [PASS][1] -> [FAIL][2] +2 similar issues
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-tglb6/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-shrfb-plflip-blt.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-tglb7/igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-shrfb-plflip-blt.html

  
#### Warnings ####

  * igt@kms_frontbuffer_tracking@fbcpsr-tiling-y:
    - shard-tglb:         [DMESG-FAIL][3] ([i915#1982]) -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-tglb8/igt@kms_frontbuffer_tracking@fbcpsr-tiling-y.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-tglb5/igt@kms_frontbuffer_tracking@fbcpsr-tiling-y.html

  
Known issues
------------

  Here are the changes found in Patchwork_18685_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@feature_discovery@psr2:
    - shard-iclb:         [PASS][5] -> [SKIP][6] ([i915#658])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-iclb2/igt@feature_discovery@psr2.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-iclb5/igt@feature_discovery@psr2.html

  * igt@gem_exec_reloc@basic-many-active@vecs0:
    - shard-glk:          [PASS][7] -> [FAIL][8] ([i915#2389]) +2 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-glk1/igt@gem_exec_reloc@basic-many-active@vecs0.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-glk5/igt@gem_exec_reloc@basic-many-active@vecs0.html

  * igt@gem_exec_suspend@basic-s4-devices:
    - shard-glk:          [PASS][9] -> [DMESG-WARN][10] ([i915#118] / [i915#95]) +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-glk5/igt@gem_exec_suspend@basic-s4-devices.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-glk9/igt@gem_exec_suspend@basic-s4-devices.html

  * igt@gem_userptr_blits@unsync-unmap-cycles:
    - shard-skl:          [PASS][11] -> [TIMEOUT][12] ([i915#2424])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-skl10/igt@gem_userptr_blits@unsync-unmap-cycles.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-skl9/igt@gem_userptr_blits@unsync-unmap-cycles.html

  * igt@i915_module_load@reload:
    - shard-skl:          [PASS][13] -> [DMESG-WARN][14] ([i915#1982]) +4 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-skl1/igt@i915_module_load@reload.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-skl3/igt@i915_module_load@reload.html

  * igt@kms_cursor_edge_walk@pipe-b-256x256-left-edge:
    - shard-glk:          [PASS][15] -> [DMESG-WARN][16] ([i915#1982])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-glk7/igt@kms_cursor_edge_walk@pipe-b-256x256-left-edge.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-glk8/igt@kms_cursor_edge_walk@pipe-b-256x256-left-edge.html

  * igt@kms_frontbuffer_tracking@psr-modesetfrombusy:
    - shard-tglb:         [PASS][17] -> [DMESG-WARN][18] ([i915#1982])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-tglb5/igt@kms_frontbuffer_tracking@psr-modesetfrombusy.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-tglb2/igt@kms_frontbuffer_tracking@psr-modesetfrombusy.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-c-frame-sequence:
    - shard-skl:          [PASS][19] -> [FAIL][20] ([i915#53])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-skl7/igt@kms_pipe_crc_basic@read-crc-pipe-c-frame-sequence.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-skl8/igt@kms_pipe_crc_basic@read-crc-pipe-c-frame-sequence.html

  * igt@kms_psr@psr2_primary_render:
    - shard-iclb:         [PASS][21] -> [SKIP][22] ([fdo#109441])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-iclb2/igt@kms_psr@psr2_primary_render.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-iclb8/igt@kms_psr@psr2_primary_render.html

  
#### Possible fixes ####

  * {igt@gem_exec_capture@pi@bcs0}:
    - shard-glk:          [INCOMPLETE][23] -> [PASS][24]
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-glk1/igt@gem_exec_capture@pi@bcs0.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-glk1/igt@gem_exec_capture@pi@bcs0.html

  * {igt@kms_async_flips@async-flip-with-page-flip-events}:
    - shard-kbl:          [FAIL][25] ([i915#2521]) -> [PASS][26] +1 similar issue
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-kbl2/igt@kms_async_flips@async-flip-with-page-flip-events.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-kbl2/igt@kms_async_flips@async-flip-with-page-flip-events.html
    - shard-tglb:         [FAIL][27] ([i915#2521]) -> [PASS][28]
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-tglb1/igt@kms_async_flips@async-flip-with-page-flip-events.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-tglb2/igt@kms_async_flips@async-flip-with-page-flip-events.html

  * igt@kms_big_fb@y-tiled-16bpp-rotate-0:
    - shard-skl:          [DMESG-WARN][29] ([i915#1982]) -> [PASS][30] +5 similar issues
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-skl6/igt@kms_big_fb@y-tiled-16bpp-rotate-0.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-skl10/igt@kms_big_fb@y-tiled-16bpp-rotate-0.html

  * igt@kms_cursor_legacy@pipe-b-torture-move:
    - shard-tglb:         [DMESG-WARN][31] ([i915#128]) -> [PASS][32] +1 similar issue
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-tglb7/igt@kms_cursor_legacy@pipe-b-torture-move.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-tglb8/igt@kms_cursor_legacy@pipe-b-torture-move.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-glk:          [DMESG-FAIL][33] ([i915#118] / [i915#95]) -> [PASS][34]
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-glk3/igt@kms_fbcon_fbt@fbc-suspend.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-glk9/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@flip-vs-expired-vblank-interruptible@c-dp1:
    - shard-apl:          [FAIL][35] ([i915#1635] / [i915#79]) -> [PASS][36]
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-apl8/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-dp1.html
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-apl1/igt@kms_flip@flip-vs-expired-vblank-interruptible@c-dp1.html

  * igt@kms_flip@flip-vs-suspend-interruptible@c-edp1:
    - shard-skl:          [INCOMPLETE][37] ([i915#198]) -> [PASS][38] +1 similar issue
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-skl7/igt@kms_flip@flip-vs-suspend-interruptible@c-edp1.html
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-skl8/igt@kms_flip@flip-vs-suspend-interruptible@c-edp1.html

  * igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1:
    - shard-skl:          [FAIL][39] ([i915#2122]) -> [PASS][40] +1 similar issue
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-skl8/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-skl6/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-msflip-blt:
    - shard-tglb:         [FAIL][41] -> [PASS][42] +3 similar issues
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-tglb7/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-msflip-blt.html
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-tglb2/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary:
    - shard-iclb:         [DMESG-WARN][43] ([i915#1982]) -> [PASS][44]
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-iclb8/igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary.html
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-iclb2/igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary.html

  * igt@kms_hdr@bpc-switch-dpms:
    - shard-skl:          [FAIL][45] ([i915#1188]) -> [PASS][46]
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-skl5/igt@kms_hdr@bpc-switch-dpms.html
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-skl8/igt@kms_hdr@bpc-switch-dpms.html

  * igt@kms_psr@psr2_basic:
    - shard-iclb:         [SKIP][47] ([fdo#109441]) -> [PASS][48]
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-iclb8/igt@kms_psr@psr2_basic.html
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-iclb2/igt@kms_psr@psr2_basic.html

  * igt@perf@blocking:
    - shard-skl:          [FAIL][49] ([i915#1542]) -> [PASS][50]
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-skl10/igt@perf@blocking.html
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-skl7/igt@perf@blocking.html

  * igt@sysfs_heartbeat_interval@mixed@vcs1:
    - shard-kbl:          [INCOMPLETE][51] ([i915#1731]) -> [PASS][52]
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-kbl4/igt@sysfs_heartbeat_interval@mixed@vcs1.html
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-kbl6/igt@sysfs_heartbeat_interval@mixed@vcs1.html

  
#### Warnings ####

  * igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend:
    - shard-skl:          [DMESG-WARN][53] ([i915#1982]) -> [INCOMPLETE][54] ([i915#198])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9135/shard-skl8/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/shard-skl2/igt@kms_vblank@pipe-b-ts-continuation-dpms-suspend.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [i915#118]: https://gitlab.freedesktop.org/drm/intel/issues/118
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#128]: https://gitlab.freedesktop.org/drm/intel/issues/128
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#1731]: https://gitlab.freedesktop.org/drm/intel/issues/1731
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2389]: https://gitlab.freedesktop.org/drm/intel/issues/2389
  [i915#2424]: https://gitlab.freedesktop.org/drm/intel/issues/2424
  [i915#2521]: https://gitlab.freedesktop.org/drm/intel/issues/2521
  [i915#53]: https://gitlab.freedesktop.org/drm/intel/issues/53
  [i915#658]: https://gitlab.freedesktop.org/drm/intel/issues/658
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * Linux: CI_DRM_9135 -> Patchwork_18685

  CI-20190529: 20190529
  CI_DRM_9135: eb70ad33fcc91d3464b07679391fb477927ad4c7 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5813: d4e6dd955a1dad02271aa41c9389f5097ee17765 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18685: faa458087c905b3273a2f7359e337725180fb019 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18685/index.html

[-- Attachment #1.2: Type: text/html, Size: 15013 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-13 23:29       ` Tang, CQ
@ 2020-10-15 15:06         ` Chris Wilson
  2020-10-15 20:09           ` Tang, CQ
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-10-15 15:06 UTC (permalink / raw)
  To: Tang, CQ, intel-gfx

Quoting Tang, CQ (2020-10-14 00:29:13)
> i915_gem_free_object() is called by multiple threads/processes, they all add objects onto the same free_list. The free_list processing worker thread becomes bottle-neck. I see that the worker is mostly a single thread (with particular thread ID), but sometimes multiple threads are launched to process the 'free_list' work concurrently. But the processing speed is still slower than the multiple process's feeding speed, and 'free_list' is holding more and more memory.

We can also prune the free_list immediately, if we know we are outside
of any critical section. (We do this before create ioctls, and I thought
upon close(device), but I see that's just contexts.)
 
> The worker launching time is delayed a lot, we call queue_work() when we add the first object onto the empty 'free_list', but when the worker is launched, the 'free_list' has sometimes accumulated 1M objects. Maybe it is because of waiting currently running worker to finish?

1M is a lot more than is comfortable, and that's even with a high-priority
worker.  The problem with objects being freed from any context is that we can't
simply put a flush_work around there. (Not without ridding ourselves of
a few mutexes at least.) We could try more than worker, but it's no more
more effort to starve 2 cpus than it is to starve 1.

No, with that much pressure the only option is to apply the backpressure
at the point of allocation ala create_ioctl. i.e. find the hog, and look
to see if there's a convenient spot before/after to call
i915_gem_flush_free_objects(). Since you highlight the vma-stash as the
likely culprit, and the free_pt_stash is unlikely to be inside any
critical section, might as well try flushing from there for starters.

Hmm, actually we are tantalizing close to having dropped all mutexes
(and similar global lock-like effects) from free_objects. That would be
a nice victory.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-15 15:06         ` Chris Wilson
@ 2020-10-15 20:09           ` Tang, CQ
  2020-10-15 20:32             ` Chris Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: Tang, CQ @ 2020-10-15 20:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



> -----Original Message-----
> From: Chris Wilson <chris@chris-wilson.co.uk>
> Sent: Thursday, October 15, 2020 8:07 AM
> To: Tang, CQ <cq.tang@intel.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> high priority
> 
> Quoting Tang, CQ (2020-10-14 00:29:13)
> > i915_gem_free_object() is called by multiple threads/processes, they all
> add objects onto the same free_list. The free_list processing worker thread
> becomes bottle-neck. I see that the worker is mostly a single thread (with
> particular thread ID), but sometimes multiple threads are launched to
> process the 'free_list' work concurrently. But the processing speed is still
> slower than the multiple process's feeding speed, and 'free_list' is holding
> more and more memory.
> 
> We can also prune the free_list immediately, if we know we are outside of
> any critical section. (We do this before create ioctls, and I thought upon
> close(device), but I see that's just contexts.)
> 
> > The worker launching time is delayed a lot, we call queue_work() when we
> add the first object onto the empty 'free_list', but when the worker is
> launched, the 'free_list' has sometimes accumulated 1M objects. Maybe it is
> because of waiting currently running worker to finish?
> 
> 1M is a lot more than is comfortable, and that's even with a high-priority
> worker.  The problem with objects being freed from any context is that we
> can't simply put a flush_work around there. (Not without ridding ourselves of
> a few mutexes at least.) We could try more than worker, but it's no more
> more effort to starve 2 cpus than it is to starve 1.
> 
> No, with that much pressure the only option is to apply the backpressure at
> the point of allocation ala create_ioctl. i.e. find the hog, and look to see if
> there's a convenient spot before/after to call
> i915_gem_flush_free_objects(). Since you highlight the vma-stash as the
> likely culprit, and the free_pt_stash is unlikely to be inside any critical section,
> might as well try flushing from there for starters.

I have not yet tested, but I guess calling i915_gem_flush_free_objects() inside free_pt_stash() will solve the problem that gem_exec_gttfill has, because it will give some back pressure on the system traffic.

But this is only for the page table 4K lmem objects allocated/freed by vma-stash. We might encounter the same situation with user space allocated objects.

--CQ

> 
> Hmm, actually we are tantalizing close to having dropped all mutexes (and
> similar global lock-like effects) from free_objects. That would be a nice
> victory.
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-15 20:09           ` Tang, CQ
@ 2020-10-15 20:32             ` Chris Wilson
  2020-10-15 22:25               ` Tang, CQ
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Wilson @ 2020-10-15 20:32 UTC (permalink / raw)
  To: Tang, CQ, intel-gfx

Quoting Tang, CQ (2020-10-15 21:09:32)
> 
> 
> > -----Original Message-----
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > Sent: Thursday, October 15, 2020 8:07 AM
> > To: Tang, CQ <cq.tang@intel.com>; intel-gfx@lists.freedesktop.org
> > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> > high priority
> > 
> > Quoting Tang, CQ (2020-10-14 00:29:13)
> > > i915_gem_free_object() is called by multiple threads/processes, they all
> > add objects onto the same free_list. The free_list processing worker thread
> > becomes bottle-neck. I see that the worker is mostly a single thread (with
> > particular thread ID), but sometimes multiple threads are launched to
> > process the 'free_list' work concurrently. But the processing speed is still
> > slower than the multiple process's feeding speed, and 'free_list' is holding
> > more and more memory.
> > 
> > We can also prune the free_list immediately, if we know we are outside of
> > any critical section. (We do this before create ioctls, and I thought upon
> > close(device), but I see that's just contexts.)
> > 
> > > The worker launching time is delayed a lot, we call queue_work() when we
> > add the first object onto the empty 'free_list', but when the worker is
> > launched, the 'free_list' has sometimes accumulated 1M objects. Maybe it is
> > because of waiting currently running worker to finish?
> > 
> > 1M is a lot more than is comfortable, and that's even with a high-priority
> > worker.  The problem with objects being freed from any context is that we
> > can't simply put a flush_work around there. (Not without ridding ourselves of
> > a few mutexes at least.) We could try more than worker, but it's no more
> > more effort to starve 2 cpus than it is to starve 1.
> > 
> > No, with that much pressure the only option is to apply the backpressure at
> > the point of allocation ala create_ioctl. i.e. find the hog, and look to see if
> > there's a convenient spot before/after to call
> > i915_gem_flush_free_objects(). Since you highlight the vma-stash as the
> > likely culprit, and the free_pt_stash is unlikely to be inside any critical section,
> > might as well try flushing from there for starters.
> 
> I have not yet tested, but I guess calling i915_gem_flush_free_objects() inside free_pt_stash() will solve the problem that gem_exec_gttfill has, because it will give some back pressure on the system traffic.

Still I'm slightly concerned that so many PD objects are being created;
it's not something that shows up in the smem ppgtt tests (or at least
it's been dwarfed by other bottlenecks), and the set of vma (and so the
PD) are meant to reach a steady state. You would need to be using a
constant set of objects and recycling the vma, not to hit the
create_ioctl flush. However, it points back to the pressure point being
around the vma bind.

> But this is only for the page table 4K lmem objects allocated/freed by vma-stash. We might encounter the same situation with user space allocated objects.

See gem_exec_create, it's mission is to cause memory starvation by
creating as many new objects as it can and releasing them after a nop
batch. That's why we have the freelist flush from create_ioctl.

Now I need to add a pass that tries to create as many vma from a few
objects as is possible.

(And similarly why we try to free requests as they are created.)

One problem is that they will catch the client after the hog, not
necessarily the hog themselves.

I'm optimistic we can make freeing the object atomic, even if that means
pushing the pages onto some reclaim list. (Which is currently a really
nasty drawback of the free worker, a trick lost with the removal of
struct_mutex.)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority
  2020-10-15 20:32             ` Chris Wilson
@ 2020-10-15 22:25               ` Tang, CQ
  0 siblings, 0 replies; 11+ messages in thread
From: Tang, CQ @ 2020-10-15 22:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



> -----Original Message-----
> From: Chris Wilson <chris@chris-wilson.co.uk>
> Sent: Thursday, October 15, 2020 1:33 PM
> To: Tang, CQ <cq.tang@intel.com>; intel-gfx@lists.freedesktop.org
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue
> high priority
> > > We can also prune the free_list immediately, if we know we are
> > > outside of any critical section. (We do this before create ioctls,
> > > and I thought upon close(device), but I see that's just contexts.)
> > >
> > > > The worker launching time is delayed a lot, we call queue_work()
> > > > when we
> > > add the first object onto the empty 'free_list', but when the worker
> > > is launched, the 'free_list' has sometimes accumulated 1M objects.
> > > Maybe it is because of waiting currently running worker to finish?
> > >
> > > 1M is a lot more than is comfortable, and that's even with a
> > > high-priority worker.  The problem with objects being freed from any
> > > context is that we can't simply put a flush_work around there. (Not
> > > without ridding ourselves of a few mutexes at least.) We could try
> > > more than worker, but it's no more more effort to starve 2 cpus than it is
> to starve 1.
> > >
> > > No, with that much pressure the only option is to apply the
> > > backpressure at the point of allocation ala create_ioctl. i.e. find
> > > the hog, and look to see if there's a convenient spot before/after
> > > to call i915_gem_flush_free_objects(). Since you highlight the
> > > vma-stash as the likely culprit, and the free_pt_stash is unlikely
> > > to be inside any critical section, might as well try flushing from there for
> starters.
> >
> > I have not yet tested, but I guess calling i915_gem_flush_free_objects()
> inside free_pt_stash() will solve the problem that gem_exec_gttfill has,
> because it will give some back pressure on the system traffic.
> 
> Still I'm slightly concerned that so many PD objects are being created; it's not
> something that shows up in the smem ppgtt tests (or at least it's been
> dwarfed by other bottlenecks), and the set of vma (and so the
> PD) are meant to reach a steady state. You would need to be using a
> constant set of objects and recycling the vma, not to hit the create_ioctl flush.
> However, it points back to the pressure point being around the vma bind.
> 
> > But this is only for the page table 4K lmem objects allocated/freed by vma-
> stash. We might encounter the same situation with user space allocated
> objects.
> 
> See gem_exec_create, it's mission is to cause memory starvation by creating
> as many new objects as it can and releasing them after a nop batch. That's
> why we have the freelist flush from create_ioctl.
> 
> Now I need to add a pass that tries to create as many vma from a few objects
> as is possible.
> 
> (And similarly why we try to free requests as they are created.)
> 
> One problem is that they will catch the client after the hog, not necessarily
> the hog themselves.
> 
> I'm optimistic we can make freeing the object atomic, even if that means
> pushing the pages onto some reclaim list. (Which is currently a really nasty
> drawback of the free worker, a trick lost with the removal of
> struct_mutex.)

On a DG1 system with 4GB lmem, and 7.7GB system memory, I can easily catch system OOM when running "gem_exec_gttfill --r all".

When I call i915_gem_flush_free_objects() inside i915_vm_free_pt_stash(), the test code passes without any problem.

With simple test 'gem_exec_gttfill --r all", I also concerned that why driver create/free so many vma-stash objects, when the system OOM happened, there are total 1.5M stash objects freed.

--CQ

> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-10-15 22:25 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-13 10:32 [Intel-gfx] [PATCH] drm/i915: Make the GEM reclaim workqueue high priority Chris Wilson
2020-10-13 12:27 ` [Intel-gfx] ✓ Fi.CI.BAT: success for " Patchwork
2020-10-13 16:19 ` [Intel-gfx] [PATCH] " Tang, CQ
2020-10-13 16:24   ` Chris Wilson
2020-10-13 16:40     ` Tang, CQ
2020-10-13 23:29       ` Tang, CQ
2020-10-15 15:06         ` Chris Wilson
2020-10-15 20:09           ` Tang, CQ
2020-10-15 20:32             ` Chris Wilson
2020-10-15 22:25               ` Tang, CQ
2020-10-14  3:19 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.